# Discord Chat Embeddings Visualizer A Streamlit application that visualizes Discord chat messages using their vector embeddings in 2D space. ## Features - **2D Visualization**: View chat messages plotted using PCA or t-SNE dimension reduction - **Interactive Plotting**: Hover over points to see message content, author, and timestamp - **Filtering**: Filter by source chat log files and authors - **Multiple Datasets**: Automatically loads all CSV files from the discord_chat_logs folder ## Installation 1. Install the required dependencies: ```bash pip install -r requirements.txt ``` ## Usage Run the Streamlit application: ```bash streamlit run streamlit_app.py ``` The app will automatically load all CSV files from the `../../discord_chat_logs/` directory. ## Data Format The application expects CSV files with the following columns: - `message_id`: Unique identifier for the message - `timestamp_utc`: When the message was sent - `author_id`: Author's Discord ID - `author_name`: Author's username - `author_nickname`: Author's server nickname - `content`: The message content - `attachment_urls`: Any attached files - `embeds`: Embedded content - `content_embedding`: Vector embedding of the message content (as a string representation of a list) ## Visualization Options - **PCA**: Principal Component Analysis - faster, good for getting an overview - **t-SNE**: t-Distributed Stochastic Neighbor Embedding - slower but may reveal better clusters ## Controls - **Dimension Reduction Method**: Choose between PCA and t-SNE - **Filter by Source Files**: Select which chat log files to include - **Filter by Authors**: Select which authors to display - **Show Data Table**: View the underlying data in table format ## Performance Notes - For large datasets, consider filtering by authors or source files to improve performance - t-SNE is computationally intensive and may take longer with large datasets - The app caches data and computations for better performance