PyData Global 2022

The 10 commandments of reliable data science
12-01, 22:00–22:30 (UTC), Talk Track I

Data science as a professional discipline is still in its infancy, and our field lacks widespread technical norms around project organization, collaboration, and reproducibility. This is painful both for practitioners and their end users because disorganized analysis is bad analysis, and bad analysis costs money and wastes time. This talk presents ten principles for correct and reproducible data science inheriting from software engineering’s seven decades of hard-earned lessons as well as numerous experiences with data science teams at organizations of all sizes. We motivate these principles by looking at some hard truths about data science “in the wild.”


Organizations have accepted the premise that data science, done well, can be a powerful toolbox for increasing efficiency, automating expensive processes, and making better decisions. But what project decisions characterize and tend to result in “high quality” data science work products? As a field, we have yet to center on a set of engineering norms promoting organized, correct, and reproducible analysis.

Trustworthy data science work requires slightly more upfront investment, but pays immense dividends in dependability, usefulness, and ease of collaboration. This talk presents ten principles for correct and reproducible data science that inherit from software engineering’s seven decades of hard-earned lessons.

Combining lessons learned from the software world with field experience from observing numerous data science teams of all sizes and configurations, these principles provide both tactical recommendations as well as higher level ideas relevant to individual practitioners, engineering managers, and senior leaders of data organizations.


Prior Knowledge Expected

No previous knowledge expected

Isaac is a co-founder and principal data scientist at DrivenData, Inc. He holds a master's in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences and a BS in Operations Research from the U.S. Coast Guard Academy, and previously spent seven years as a Coast Guard officer serving in a variety of operational and quantitative roles.