Class Documentation
Analysis Class Documentation
The Python analysis pipeline is object-oriented. Three Python classes run most of the methods:
- 
ExportKeybase: Python3 class to generate lists of information via direct interface toKeybase. - 
Lives in
create_export.py - 
Import using:
python from create_export import ExportKeybase - 
GenerateAnalytics: Python3 class to organize different kinds of data fromKeybaseexport. - 
Lives in
generate_analytics.py - 
Import using:
python from generate_analytics import GeneratedAnalytics - 
Messages: Python3 class that usessqlalchemyto interface withSQLdatabase. - 
Lives in
database.py - 
Import using:
python from database import Messages - 
Note: this is a simpler class that really only has a constructor and properties related to the variables of interest that are extracted from the
Keybasedata. 
Notes
Miscellaneous observations during development.
Regarding Implementation
- We currently do not (but could):
 - Import Pin Message type because unable to find refence to message being pinned.
 - Import additional metadata such as: 
device IDdevice namereactions within a messageteam_mentions
 
Regarding Data-Driven Models
- Topic Modeling on channels and across channels
 - Can we train a simple Linear Discriminant Analysis (LDA) model on channel-based text messages in order to get "good" separation of channels that do not have much overlap based on what we know and understand about language already?
- Based on the training data that we have available to perform such a task, do we expect there to be "good" separation of topics by channel from the Complexity Weekend Keybase text database?
 - Do we need a different dataset for Topic Modeling altogether?
 
 - Sentiment Analysis
 - Why does the VADER algorithm think that Jason's 
Keybaseprofile has such a negative sentiment score? Are there other better algorithms? Is there a list of other algorithms and links to source documentation or (even better) related literature to cite? - ~~Machine Learning~~
 
Links
Assorted links to tools and readings.
Tools moving forward
NLTK: Open-source natural language toolkit.spaCy: Natural language processing (NLP) API that still provides many useful free tools.kumo.io: Interactive network graph visualization tool with easy Import/Export format (and supports export of embedded views).