PyData Global 2022

Stefan Krawczyk

A hands-on leader and Silicon Valley veteran, Stefan has spent the last 15 years working on data and machine learning systems at companies like Stitch Fix, Nextdoor and LinkedIn.

Most recently, Stefan led the Model Lifecycle team at Stitch Fix. Its mission was to streamline the model productionization process for over 100+ data scientists and machine learning engineers. The infrastructure they built created and tracked tens of thousands of models, and provided automated deployment that adheres to MLOps best practices.

A regular conference speaker, Stefan has guest lectured at Stanford’s Machine Learning Systems Design course and is an author of a popular open source framework called Hamilton.

The speaker's profile picture

Sessions

12-03
14:00
30min
Scalable Feature Engineering with Hamilton
Elijah ben Izzy, Stefan Krawczyk

In this talk we present Hamilton, a novel open-source framework for developing and maintaining scalable feature engineering dataflows. Hamilton was initially built to solve the problem of managing a codebase of transforms on pandas dataframes, enabling a data science team to scale the capabilities they offer with the complexity of their business. Since then, it has grown into a general-purpose tool for writing and maintaining dataflows in python. We introduce the framework, discuss its motivations and initial successes at Stitch Fix, and share recent extensions that seamlessly integrate it with distributed compute offerings, such as Dask, Ray, and Spark.

Talk Track I