Backlog - Discord Binding
Directory
Next Steps
TODO
To Sort
Priority
Documentation + Writing
Discord Scraping Procedures
SQL vs NoSQL blog post
Postgres to DuckDB/CSV Tutorial
Benchmark everything and write a blog post
Generate Synthetic Dataset / Get Permission to use a guild as a public dataset
Blog post on the ETL pipeline
Aggregations would be much harder, stay with SQL till the end
Question Database with Embeddings
When do we start tagging the messages?
We need systems of tags that are composable
We want to be able to segment the data into contextualized conversations
Just build the dam discord binding dashboard
Better Unit Testing, PyTest, and document it
Better Logging throughout the application, write out control flow logic
Add descriptions too all the functions
Complete all the TODO queries in Questions for Discord Data
Spark Binding
Interviews
Research
Local Graph Datastructures
Backlog - Discord Scraping
Discord ETL - Base
JSON File SQL Validator
Guild Specific Indexing from S3 and file system data
We need a list of all files in the S3 Buckets accessable to the API
How do we want to do labeling of our data
message_urls_t needs guilds_t, author_guild_id, and channel_id
What discord author was mentioned the most?
JSON Neo4J Validator
Job Queue for Queries
I need to test mentions
Discord Binding Fix Roles Bug
Add Constraints for Authors_t Attachments_t and Roles_t for DAO Auditing via Discord project
Add Constraints to all Tables
Postgres to SQLite Research, Tutorial, and Script
Index all S3 objects to key value store
Review Graph QL Alchemy
Come up with all the Cypher Queries we should do
ETL Backlog
Test requirements.txt
Make URL indexing optional via environment variable
Separate the database indexing code, search insert files
Add Constraints Retroactively
Add missing SQL constraints
Figure out how to add data that is missing into the database
https://chat.openai.com/share/53ed5b7b-95b6-4a08-ae16-417b272d749b
Validate all data is ingested correctly and effectively, manually, have scripts test json files for data in database
Optimize Roles indexing
messages_urls_tables
SQL Alchemy indexing data
Reply Table
SQL Alchemy indexing data
Neo4J indexing Data
For SQL Alchemy rather than doing upsert do insert, so do a select first manually so no postgres native stuff is there
Neo4j Specific
Emoji Nodes
Use author nodes not author_guild_id nodes
NLP on Discord Data
Named Entity Recognition in Graph Database
Label the Channels according to a schema, for example which channel is Announcements
Question Extraction
Write
Explain how I want to annotate my raindrop and hypothesis
Wikipedia / Wikidata Schema
Intent Description UI, we need to be able to do query and then provide description of what we see
Research prompts to use for labeling users and their data, write multiple blog posts
We need to come up with a long list of questions for each DAO
queries.py - Base Queries
querys.py - Caching Job Service
Create a file for this
Do we want to use a module for this or develop an API
We should be able to do both, just send JSON back and forth
graphs.py
Backend
Hire a Mentor to tell me which Python API is best
RBAC like Google Docs Sharing Links
Dockerize
Query Caching using IndexedDB
Use Discord ETL, Job Queue
Use Discord ETL, Index all S3 objects to key value store
Document API, OpenAPI
Come up with tags and collections for tags
History of Queries
Manual Labeling Data
Brainstorming Labels and Buckets for the data
Frontend
Order the Users component, alphabetically, most messages, most messages in channels, most attachments, min message count, etc. etc.
Drawer Component should move back when clocking on the background
Add component to select specific URL's for doing URL queries
Frontend - Add padding on the drawer component
Data Visualization Component
Datagrid Compnent that takes Pandas JSON
Long Term
Additional Data Sources ETL, Create a schema for twitter and youtube data for indexing profiles and other metadata about content
Research Example Project Roadmaps
Updated Backlog - Discord Binding
Updated ETL Diagram
Don't use JSON tables anymore
Neo4J was added
Created Application Dependency Diagram
Jupyter Notebook Report I can send people as PDF and HTML
Reorganized Questions for Discord Data
I also made a funny discovery. I have yet to find a really good cypher query I can't do in relational databases on the discord data. But keeping track of all the SQL queries I am writing is getting so complicated that it may make sense to put them in a graph database. I can put all the queries in queries.py into a graph schema that can be used in the frontend to make selecting queries easier.
Conducted interview on Fiverr
plotly_graph API Endpoint that returns Plotly JSON
list_graphs API endpoint to return graph's and their data
Plotly component for react frontend
SelectDataVisualization react component
Fetch data from /list_graphs
Update Context with Data Visualization metadata from /list_graphs
Use context updated from SelectDataVisualization to in PlotlyGraph react component
Fetch data from /plotly_graphs
Render data from /plotly_graphs in react
React setup
Appbar
Drawer
Select Guild, Channel, Author Component
Context Setup
Proxy
Plotly Graph Python Module and Graph Jupyter Notebook
One graph complete many more to go
Tested Django REST Framework, decided not to use it
Create example REST API with hard coded responses
Get basic endpoint query working and returning data from database
Write basic tests for API
Django API returns queries.py data
Got core set of queries running in queries.py and run _test_queries.py
Have script to just create the SQL schema
Tested constraints and added more
URL + Domain Extraction
Script to retroactively add some constraints
added URLs class to SQLAlchemy orm
Neo4J now indexes URL's and Domains
Reply data is not parsed from JSON
count column name in reactions_t should be reaction_count
rename content as msg_conent in messages_t
rename content_length as msg_content_length in messages_t
Create a directory for Logs
CSV for Analytics files need their own directory
Fix the last couple weird column_names that don't use underscores
Fixed guilds_t.iconUrl to icon_url
Fixed messages_t.isPinned to be is_pinned
Test ORM on Native Postgres loaded queries
SQLALchemy add the reply Class
Neomodel for Neo4J classes complete
Ingest of most data into neo4j complete
Backlinks