PyData Global 2022

Production-grade Machine Learning with Flyte
12-03, 15:00–16:30 (UTC), Workshop/Tutorial I

MLOps encapsulates the discipline of – and infrastructure that supports – building and maintaining machine learning models in production. This tutorials highlight four challenges in carrying this out effectively: scalability, data quality, reproducibility, recoverability, and auditability. As a data science and machine learning practitioner, you’ll learn how Flyte, an open source data- and machine-learning-aware orchestration tool, is designed to overcome these challenges and you'll get your hands dirty using Flyte to build ML pipelines with increasing complexity and scale!


As the discipline of machine learning Operations (MLOps) matures, it’s becoming clear that, in practice, building ML models poses additional challenges compared to the traditional software development lifecycle. This tutorial will focus on four challenges in the context of ML model development: scalability, data quality, reproducibility, recoverability, and auditability. Using Flyte, a data- and machine-learning-aware open source orchestration tool, we’ll see how to address these challenges and abstract them out to give you a broader understanding of how to surmount them.

First I’ll define and describe what these four challenges mean in the context of ML model development. Then I’ll dive into the ways in which Flyte provides solutions to them, taking you through the reasoning behind Flyte’s data-centric and ML-aware design. We'll cover:

  • Flyte tasks and workflows: the building blocks for expressing execution graphs
  • Dynamic workflows: for defining execution graphs at runtime
  • Map tasks: Scale embarrassingly parallel workflows
  • Plugins: Extend Flyte's core functionality
  • Type System: See the benefits of static type safety
  • DataFrame Types: Validate dataframe-like objects at runtime
  • Reproducibility: Containerize and harden your execution graph
  • Caching: Don't waste precious compute re-running nodes
  • Recovering Executions: Build fault-tolerant pipelines
  • Checkpointing: Checkpoint progress within a node
  • Flyte Decks: Create rich static reports associated with your data/model artifacts

Attendees will learn how Flyte distributes and scales computation, enforces static and runtime type safety, leverages Docker to provide strong reproducibility guarantees, implements caching and checkpointing to recover from failed model training runs, and ships with built-in data lineage tracking for full data pipeline auditability.

Tutorial Repo: https://github.com/flyteorg/flyte-conference-talks/tree/main/pydata-global-2022


Prior Knowledge Expected

Previous knowledge expected

Niels is a machine learning engineer and core maintainer of Flyte, creator of UnionML, creator of Pandera, a data testing tool for dataframe-like objects.

He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology.

His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems. He enjoys developing open source tools to make data science and machine learning practitioners more productive.