clustermap app
This commit is contained in:
58
apps/cluster_map/README.md
Normal file
58
apps/cluster_map/README.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Discord Chat Embeddings Visualizer
|
||||
|
||||
A Streamlit application that visualizes Discord chat messages using their vector embeddings in 2D space.
|
||||
|
||||
## Features
|
||||
|
||||
- **2D Visualization**: View chat messages plotted using PCA or t-SNE dimension reduction
|
||||
- **Interactive Plotting**: Hover over points to see message content, author, and timestamp
|
||||
- **Filtering**: Filter by source chat log files and authors
|
||||
- **Multiple Datasets**: Automatically loads all CSV files from the discord_chat_logs folder
|
||||
|
||||
## Installation
|
||||
|
||||
1. Install the required dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Run the Streamlit application:
|
||||
|
||||
```bash
|
||||
streamlit run streamlit_app.py
|
||||
```
|
||||
|
||||
The app will automatically load all CSV files from the `../../discord_chat_logs/` directory.
|
||||
|
||||
## Data Format
|
||||
|
||||
The application expects CSV files with the following columns:
|
||||
- `message_id`: Unique identifier for the message
|
||||
- `timestamp_utc`: When the message was sent
|
||||
- `author_id`: Author's Discord ID
|
||||
- `author_name`: Author's username
|
||||
- `author_nickname`: Author's server nickname
|
||||
- `content`: The message content
|
||||
- `attachment_urls`: Any attached files
|
||||
- `embeds`: Embedded content
|
||||
- `content_embedding`: Vector embedding of the message content (as a string representation of a list)
|
||||
|
||||
## Visualization Options
|
||||
|
||||
- **PCA**: Principal Component Analysis - faster, good for getting an overview
|
||||
- **t-SNE**: t-Distributed Stochastic Neighbor Embedding - slower but may reveal better clusters
|
||||
|
||||
## Controls
|
||||
|
||||
- **Dimension Reduction Method**: Choose between PCA and t-SNE
|
||||
- **Filter by Source Files**: Select which chat log files to include
|
||||
- **Filter by Authors**: Select which authors to display
|
||||
- **Show Data Table**: View the underlying data in table format
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- For large datasets, consider filtering by authors or source files to improve performance
|
||||
- t-SNE is computationally intensive and may take longer with large datasets
|
||||
- The app caches data and computations for better performance
|
||||
Reference in New Issue
Block a user