PyData Global 2022

Hugo Bowne-Anderson

Hugo Bowne-Anderson is Head of Developer Relations at Outerbounds, a company committed to building infrastructure that provides a solid foundation for machine learning applications of all shapes and sizes. He is also host of the industry podcast Vanishing Gradients. Hugo is a data scientist, educator, evangelist, content marketer, and data strategy consultant, with extensive experience at Coiled, a company that makes it simple for organizations to scale their data science seamlessly, and DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry. He has developed over 30 courses on the DataCamp platform, impacting over 2 million learners worldwide through his own courses. He also created the weekly data industry podcast DataFramed, which he hosted and produced for 2 years. He is committed to spreading data skills, access to data science tooling, and open source software, both for individuals and the enterprise.

The speaker's profile picture

Sessions

12-01
09:30
90min
Full-stack Machine Learning for Data Scientists
Hugo Bowne-Anderson

One of the key questions in modern data science and machine learning, for businesses and practitioners alike, is how do you move machine learning projects from prototype and experiment to production as a repeatable process. In this workshop, we present a hands-on introduction to the landscape of production-grade tools, techniques, and workflows that bridge the gap between laptop data science and production ML workflows. Participants will learn how to take common machine learning models, such as those from scikit-learn, XGBoost, and Keras, and productionize them using Metaflow.

We’ll present a high-level overview of the 8 layers of the ML stack: data, compute, versioning, orchestration, software architecture, model operations, feature engineering, and model development. We’ll present a schematic as to which layers data scientists need to be thinking about and working with, and then introduce attendees to the tooling and workflow landscape. In doing so, we’ll present a widely applicable stack that provides the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure.

You can find the companion repository for the workshop here: https://github.com/outerbounds/full-stack-ML-metaflow-tutorial.

Workshop/Tutorial I