PyData Global 2022

ML Model Traceability and Reproducibility by Design
12-03, 09:30–10:00 (UTC), Talk Track I

Model traceability and reproducibility are crucial steps when deploying machine learning models. Model traceability allows us to know which version of the model generated which prediction. Model reproducibility ensures that we can roll back to the previous versions of the model anytime we want.
We, as ML engineers, designed reusable workflows which enable data scientists to follow these two principles by design.


We would like to present our reusable workflows, which can be imported and used in every data science project repository to deploy ML models into production by following MLOPs principles. This heavily depends on the tech stack we have in our organization. We mainly focus on traceability and reproducibility where we connect GitHub commit hash, Databricks run_id, and mlflow run_id to each prediction generated by the model, at each API request. In that way, we ensure the following:

For any given ML model, it is possible to look up unambiguously:
- Corresponding code/ commit on git
- Infrastructure used for training and serving
- Environment used for training and serving
- ML model artifacts


Prior Knowledge Expected

Previous knowledge expected

Basak Eskili is a Machine Learning Engineer at Ahold Delhaize. She is working on creating new tools and infrastructure that enable data scientists to quickly operationalise algorithms. She is bridging the space between data scientists and platform engineers while improving the way of working in accordance with MLOps principles.

In her previous role, she was responsible for bringing models to production. She focused on NLP projects and building data processing pipelines. Basak also implemented new solutions by using cloud services for existing applications and databases to improve time and efficiency.

Maria is a Senior Machine Learning Engineer at Ahold Delhaize. Maria is bridging the gap between data scientists infra and IT teams at different brands and focuses on standardization of machine learning operations across all the brands within Ahold Delhaize.
During nine years in Data&Analytics, Maria tried herself in different roles, from data scientist to a machine learning engineer, was part of teams in various domains, and have built broad knowledge. Maria believes that a model only starts living when it is in production. For this reason, last seven years, her focus was on the automation and standardization of processes related to machine learning.