PyData Global 2022

Managing Python Dependencies at scale
12-01, 08:00–08:30 (UTC), Talk Track II

This talk is about the approach we've taken at the Apache Airflow for managing our dependencies at scale of a project that is the most popular Data Orchestrator in the world, consists of ~ 80 independent package and has more than 650 depenencies in total (and did not loose our sanity).


In this talk we will talk about the challenges we faced when managing Apache Airflow dependencies at scale. The maintainer who is the "dependency-maintainer" for Apache Airflow will present the solutions that allowed Airflow to survive breaking one monolithic package into anout 80 smaller ones and continue releasing Airflow for the last 4 years.

Despite a bit of a complex environment of the Python dependency world, breaking changes in PyPI and setuptools. We will tell you how he kept sanity while doing so while managing the environment, with non-stop development of Airflow through multiple releases, keeping our users happy and letting them not only consistently install but alsow allow to upgrade theor dependncies while they use Airflow but also to survive a number of severly breaking changes that our dependencies introduced - breaking multiple other packages out there.

Apache Airflow is one of the biggest projects in PyPI when it comes to dependencies. Airflow itself consists of the main "Airflow" package but additionally to that Apache Airflow releases 70+ provider packages that give optional Airflow functionality.

We regularly release - at least 20-30 packages a month (sometimes all 70+), and when you count all transitive dependencies we have way more than 500 (!) Python dependencies. It's so big that we broke pip after the new dependency resolver was introduced and got into an argument with PyPI maintainers, which finally led to building new friendships (and enemies) and helping PyPI become more stable and robust.

It also shows how you can manage your dependencies secure, while not spending a ton of time on testing latest security fixes - which is absolute necessary in the wake of "supply-chain security" awareness.

The story is quite fascinating (for those who are fascinated by dependency hell management that is) - and while you might not have as big of a scope as Airlfow has, you might learn a few tricks and approaches that you might find useful in your project.


Prior Knowledge Expected

Previous knowledge expected

Independent Open-Source Contributor and Advisor, Committer and PMC member of Apache Airflow, Member of the Apache Software Foundation

Jarek is an Engineer with a broad experience in many subjects - Open-Source, Cloud, Mobile, Robotics, AI, Backend, Developer Experience, but he also had a lot of non-engineering experience - running a company, being CTO, organizing big, international community events, technical sales support, pr and marketing advisory but also looking at legal aspect of licensing and building open-source communities are all under his belt.

With the experience in very small and very big companies and everything in-between, Jarek found his place in Open-Source world, where his internal individual-contributor drive can be used to the uttermost of the potential.