Skip to content

ETL to QE, Update 4, Code Refactor and TrueNAS S3

Date: 2023-10-05

See Discord Binding for project context

Now just run a single script

Today I read through the README file for Discord Binding project and realized that the system could not be run end to end via a single script.

To use S3 or not to use S3

I have been having a dilemma to have my scripts reading the raw JSON files from disk or should I be reading them from S3. After my Rclone failure on Day 2 I have decided against using S3 for now. What I have done on the other hand is use rsync to backup everything to a share on my server running TrueNAS Scale. With TrueNAS Scale I can manage stoage however I want, I got

  1. NFS
  2. Samba
  3. iSCSI
  4. WebDav
  5. FTP
  6. sshfs
  7. rsync or scp
  8. S3

So turns out TrueNas Scale has a S3 service you can simply turn on in settings. Point the service at the desired share, set the public and private share key and you are good to go.

Just like when I decided to remove SQLite from the ETL pipeline to makes things simpler having the pipeline work with S3 was also going to make things simpler.

Problems installing Postgres on Debian

Installing Postgres on my Debian server was very annoying. I was easily able to,

  1. Install it using apt
  2. Start it using systemctl
  3. Login using account postgres
  4. Login using psql referencing localhost

But when it came to logging in from another machine using the servers IP address I was out of luck. No firewall is running on this device. I followed this stack overflow post and created a question, How to allow remote IP address to connect to Postgres server?, for future me to answer but after an hour of this I gave up because docker would just work like immediately with no settings having to be changed.

Note: none of my services, postgres or S3, are publicly exposed on the internet.

Trying to use Hamilton DAG

  • Get Discord Binding working with Hamilton DAG
  • The old postgres data is not available in part of the DAG.
  • Tomorrow I will have all the data loaded into postgres so I can get storage estimates for how much Ram I will need in a Spark server.

Other

  • Talk about wanting to grab data by guild, load into system, and do analytics. Never having everything in memory at the same time just like how CPU's have different levels of cache and the human brain can only really keep track of 7 things at once.
  • Explain the value of Interrogation