Skip to content

Backlog - Discord Binding

Paul you should focus on using your tool to do your own investigations before you go about asking people what they want, identify what you want.

Directory

Next Steps

TODO

  • To Sort
  • Priority
  • Documentation + Writing
    • Discord Scraping Procedures
    • SQL vs NoSQL blog post
    • Postgres to DuckDB/CSV Tutorial
    • Benchmark everything and write a blog post
      • Generate Synthetic Dataset / Get Permission to use a guild as a public dataset
    • Blog post on the ETL pipeline
      • Aggregations would be much harder, stay with SQL till the end
    • Question Database with Embeddings
    • When do we start tagging the messages?
    • We need systems of tags that are composable
    • We want to be able to segment the data into contextualized conversations
      • Segment by time block
    • Just build the dam discord binding dashboard
    • Better Unit Testing, PyTest, and document it
      • Better Logging throughout the application, write out control flow logic
      • Add descriptions too all the functions
    • Complete all the TODO queries in Questions for Discord Data
    • Spark Binding
  • Interviews
  • Research
  • Backlog - Discord Scraping
  • Discord ETL - Base
    • JSON File SQL Validator
    • Guild Specific Indexing from S3 and file system data
    • We need a list of all files in the S3 Buckets accessable to the API
    • How do we want to do labeling of our data
    • message_urls_t needs guilds_t, author_guild_id, and channel_id
    • What discord author was mentioned the most?
    • JSON Neo4J Validator
    • Job Queue for Queries
    • I need to test mentions
    • Discord Binding Fix Roles Bug
    • Add Constraints for Authors_t Attachments_t and Roles_t for DAO Auditing via Discord project
    • Add Constraints to all Tables
    • Postgres to SQLite Research, Tutorial, and Script
    • Index all S3 objects to key value store
    • Review Graph QL Alchemy
    • Come up with all the Cypher Queries we should do
    • ETL Backlog
      • Test requirements.txt
      • Make URL indexing optional via environment variable
      • Separate the database indexing code, search insert files
      • Add Constraints Retroactively
        • Add missing SQL constraints
        • Figure out how to add data that is missing into the database
        • https://chat.openai.com/share/53ed5b7b-95b6-4a08-ae16-417b272d749b
      • Validate all data is ingested correctly and effectively, manually, have scripts test json files for data in database
      • Optimize Roles indexing
      • messages_urls_tables
        • SQL Alchemy indexing data
      • Reply Table
        • SQL Alchemy indexing data
        • Neo4J indexing Data
        • For SQL Alchemy rather than doing upsert do insert, so do a select first manually so no postgres native stuff is there
      • Neo4j Specific
        • Emoji Nodes
        • Use author nodes not author_guild_id nodes
  • NLP on Discord Data
    • Named Entity Recognition in Graph Database
    • Label the Channels according to a schema, for example which channel is Announcements
    • Question Extraction
    • Write
      • Explain how I want to annotate my raindrop and hypothesis
    • Wikipedia / Wikidata Schema
    • Intent Description UI, we need to be able to do query and then provide description of what we see
    • Research prompts to use for labeling users and their data, write multiple blog posts
    • We need to come up with a long list of questions for each DAO
  • queries.py - Base Queries
  • querys.py - Caching Job Service
    • Create a file for this
      • Do we want to use a module for this or develop an API
      • We should be able to do both, just send JSON back and forth
  • graphs.py
  • Backend
    • Hire a Mentor to tell me which Python API is best
    • RBAC like Google Docs Sharing Links
    • Dockerize
    • Query Caching using IndexedDB
    • Use Discord ETL, Job Queue
    • Use Discord ETL, Index all S3 objects to key value store
    • Document API, OpenAPI
    • Come up with tags and collections for tags
    • History of Queries
    • Manual Labeling Data
    • Brainstorming Labels and Buckets for the data
  • Frontend
    • Order the Users component, alphabetically, most messages, most messages in channels, most attachments, min message count, etc. etc.
    • Drawer Component should move back when clocking on the background
    • Add component to select specific URL's for doing URL queries
    • Frontend - Add padding on the drawer component
    • Data Visualization Component
      • Datagrid Compnent that takes Pandas JSON
  • Long Term
    • Additional Data Sources ETL, Create a schema for twitter and youtube data for indexing profiles and other metadata about content

Completed 2023-12-03

  • Research Example Project Roadmaps
  • Updated Backlog - Discord Binding
    • Updated ETL Diagram
      • Don't use JSON tables anymore
      • Neo4J was added
    • Created Application Dependency Diagram
  • Jupyter Notebook Report I can send people as PDF and HTML
    • Added a bunch of queries
  • Reorganized Questions for Discord Data
  • I also made a funny discovery. I have yet to find a really good cypher query I can't do in relational databases on the discord data. But keeping track of all the SQL queries I am writing is getting so complicated that it may make sense to put them in a graph database. I can put all the queries in queries.py into a graph schema that can be used in the frontend to make selecting queries easier.

Completed 2023-12-02

  • Conducted interview on Fiverr

Completed 2023-11-30

  • plotly_graph API Endpoint that returns Plotly JSON
  • list_graphs API endpoint to return graph's and their data
  • Plotly component for react frontend
  • SelectDataVisualization react component
    • Fetch data from /list_graphs
    • Update Context with Data Visualization metadata from /list_graphs
  • Use context updated from SelectDataVisualization to in PlotlyGraph react component
    • Fetch data from /plotly_graphs
    • Render data from /plotly_graphs in react

Completed 2023-11-29

  • React setup
    • Appbar
    • Drawer
    • Select Guild, Channel, Author Component
    • Context Setup
    • Proxy
  • Plotly Graph Python Module and Graph Jupyter Notebook
    • One graph complete many more to go

Completed 2023-11-28

  • Tested Django REST Framework, decided not to use it
    • Create example REST API with hard coded responses
    • Get basic endpoint query working and returning data from database
    • Write basic tests for API
  • Django API returns queries.py data

Completed 2023-11-27

  • Got core set of queries running in queries.py and run _test_queries.py
  • Have script to just create the SQL schema
  • Tested constraints and added more
  • URL + Domain Extraction
  • Script to retroactively add some constraints
  • added URLs class to SQLAlchemy orm
  • Neo4J now indexes URL's and Domains

Completed 2023-11-23

  • Reply data is not parsed from JSON
  • count column name in reactions_t should be reaction_count
  • rename content as msg_conent in messages_t
  • rename content_length as msg_content_length in messages_t
  • Create a directory for Logs
  • CSV for Analytics files need their own directory
  • Fix the last couple weird column_names that don't use underscores
    • Fixed guilds_t.iconUrl to icon_url
    • Fixed messages_t.isPinned to be is_pinned
  • Test ORM on Native Postgres loaded queries
  • SQLALchemy add the reply Class
  • Neomodel for Neo4J classes complete
  • Ingest of most data into neo4j complete