PyData Global 2022

Metadata Systems for End-to-End Data & Machine Learning Platforms
12-02, 13:00–13:30 (UTC), Talk Track II

Organisations have been growingly adopting and integrating a non-trivial number of different frameworks at each stage of their machine learning lifecycle. Although this has helped reduce time-to-value for real-world AI use-cases, it has come at a cost of complexity and interoperability bottlenecks.


Overview

Organisations have been growingly adopting and integrating a non-trivial number of different frameworks at each stage of their machine learning lifecycle. Although this has helped reduce time-to-value for real-world AI use-cases, it has come at a cost of complexity and interoperability bottlenecks.

Each stage in the end-to-end lifecycle involves different stakeholders that make decisions and perform actions that can modify data and/or ML components with use-case-specific but ever compoinding risks, resulting in a growing need to ensure a minimum-level of metadata is collected, tracked and managed. This becomes growingly important due to the need to ensure relevant overarching compliance requirements, as well as architectural requirements on lineage, auditability, accountability and reproducibility.

In this session we will dive into the challenges present in the metadata layer of large-scale systems, as well as tooling, best practices and solutions that can be adopted to tackle these challenges. We will discuss the rise of the metadata management systems, the challenges they have been able solve, as well as critical shortcomings where ecosystem-wide collaboration will be key from tooling-level alignemnt to ensure long-term robustness of these heterogeneous end-to-end platform.

Benefits to the ecosystem

In recent years we have experienced how the evolution of the areas of DataOps and MLOps have introduced further complexities that involve concepts such as data-versioning, model-versioning, model registries, ML experiment tracking, ML model deployment, ML model promotion, monitoring, etc. These latter developments have raised new challenges that the ecosystem has been able to tackle through extending existing metadata management tools, as well as the creation of new tools.

This talk aims to help further the discussion on existing best practices where metadata schemas, protocols and tooling has been succesful in enabling interoperability across multiple systems in these end to end platforms. We hope that it brings benefits to the ecosystem that go beyond this current session into actionable discussions, collaborations and open source contributions towards continuing the momentum on improving interoperability across the MLops ecosystem.


Prior Knowledge Expected

Previous knowledge expected

Alejandro is the Chief Scientist at the Institute for Ethical AI & Machine Learning, where he contributes to policy and industry standards on the responsible design, development and operation of AI, including the fields of explainability, GPU acceleration, ML security and other key machine learning research areas. Alejandro Saucedo is also Director of Engineering at Seldon Technologies, where he leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and has a strong track record building cross-functional teams of software engineers. He is currently appointed as governing council Member-at-Large at the Association for Computing Machinery, and is currently the Chairperson of the GPU Acceleration Kompute Committee at the Linux Foundation.

LInkedin: https://linkedin.com/in/axsaucedo
Twitter: https://twitter.com/axsaucedo
Github: https://github.com/axsaucedo
Website: https://ethical.institute/

This speaker also appears in: