The Professional Data Science Manifesto
SOURCE: The Professional Data Science Manifesto - Manifesto of Professional Data Science
Latest technology advancements have made data processing accessible, cheap and fast for everyone. We believe combining engineering practices with the scientific method will extract the most utility from these advancements. So this manifesto proposes a principled methodology for unifying science and technology by valuing:
- Minimal Viable Products over prototypes
- APIs over databases
- Clever use of computation over convenient assumptions
- Dashboards over reports
- Validation, scrutiny and repeatability over convention and ad verecundiam
That is, while there is value in the items on the right, we value the items on the left more.
Principles
-
Aim to completely remove manual intervention in numerical processing.
-
Data science is about solving problems, not models or algorithms.
-
All validation of data, hypotheses and performance should be tracked, reviewed and automated.
-
Prior to building a model, construct an evaluation framework with end-to-end business focused acceptance criteria.
-
A product needs a pool of measures to evaluate its quality. A single number cannot capture the complexity of reality.
-
Even research can be broken down into clearly defined tasks. The smallest of iterations should be preferred in acquiring, integrating and correcting knowledge.
-
Don’t neglect assumptions in models. Make them explicit then aim to have them either verified or removed.