1.9 KiB
1.9 KiB
Discord Chat Embeddings Visualizer
A Streamlit application that visualizes Discord chat messages using their vector embeddings in 2D space.
Features
- 2D Visualization: View chat messages plotted using PCA or t-SNE dimension reduction
- Interactive Plotting: Hover over points to see message content, author, and timestamp
- Filtering: Filter by source chat log files and authors
- Multiple Datasets: Automatically loads all CSV files from the discord_chat_logs folder
Installation
- Install the required dependencies:
pip install -r requirements.txt
Usage
Run the Streamlit application:
streamlit run streamlit_app.py
The app will automatically load all CSV files from the ../../discord_chat_logs/ directory.
Data Format
The application expects CSV files with the following columns:
message_id: Unique identifier for the messagetimestamp_utc: When the message was sentauthor_id: Author's Discord IDauthor_name: Author's usernameauthor_nickname: Author's server nicknamecontent: The message contentattachment_urls: Any attached filesembeds: Embedded contentcontent_embedding: Vector embedding of the message content (as a string representation of a list)
Visualization Options
- PCA: Principal Component Analysis - faster, good for getting an overview
- t-SNE: t-Distributed Stochastic Neighbor Embedding - slower but may reveal better clusters
Controls
- Dimension Reduction Method: Choose between PCA and t-SNE
- Filter by Source Files: Select which chat log files to include
- Filter by Authors: Select which authors to display
- Show Data Table: View the underlying data in table format
Performance Notes
- For large datasets, consider filtering by authors or source files to improve performance
- t-SNE is computationally intensive and may take longer with large datasets
- The app caches data and computations for better performance