ETL to QE, Update 7, Fixing Bugs
Date: 2023-10-11
See Discord Binding for project context
guild_id column not in authors_t table
commit: 112a76164e756e4edf1c14fae92f57de441ffe5e
I tried to run some aggregation queries on the authors_t table, such as What discord guild's have the most Users?, but found that guild_id was missing. I need to have the data noramlized because if I use a join the query can be an order of magnitude slower or even more. It is best to perform aggregations, for example using sum
or group by
in a single table.
isBot column not indexing in any table
commit: 112a76164e756e4edf1c14fae92f57de441ffe5e
I noticed when doing my initall analytics when asking questions like, What discord guilds have the most messages? that some guilds had millions of messages and others had 10's of thousands. I realized that this data is not useful unless I know who is a bot. Luckily this data was in the export but did not make it into the SQL database. I ran the following query to test if the isBot data was in the database or not.
SELECT
distinct isBot
from
messages_t
The result of the query above only returned a single row therefore the data was not getting indexed. I was able to go troubleshoot the source code and found I was pulling the incorrect key from the JSON from DiscordChatExporter
roles_t table not indexing too much redundant data
commit: 398ff8c73ac9d82e3cb95c5ea05700ca64ea8a5e
The roles_t table had almost as many rows as the messages_t table which did not make sense( see code snippit below). I noticed that roles_t did not have a primary key which was allowing this to happen. I went back into the code, added a primary key to the schema, and generated a primary key inside the ExportDiscord.process_discord_json method, and added a on conflict
statement to the insert for roles_t
# Redacted
Getting Buckets
mah_json
Running process_discord_json
Done Running process_discord_json
json_data_to_sql Inserting
# Redacted
tbd_table_name = guilds
1
tbd_table_name = channels
1
tbd_table_name = messages
20768
tbd_table_name = authors
6417
tbd_table_name = roles
257437