PyData Global 2022

Deploying Dask
12-01, 15:30–16:00 (UTC), Talk Track II

Dask is a framework for parallel computing in Python.
It's great, until you need to set it up.

Kubernetes? Cloud? HPC? SSH? YARN/Hadoop even?
What's the right deployment technology to choose?

After you set it up a new set of problems arise:

  • How do you install software across the cluster?
  • How do you secure network access?
  • How do you access secure data that needs credentials?
  • How do you track who uses it and constrain costs?
  • When things break, how do you track them down?

There exist solutions to these problems in open source packages like dask-kubernetes, helm charts, dask-cloudprovider, and dask-gateway, as well as commercially supported products like Coiled, Saturn, QHub, AWS EMR, and GCP Dataproc. How do we choose?

This talk describes the problem faced by people trying to deploy any distributed computing system, and tries to construct a framework to help them make decisions on how to deploy.


Dask is a framework for parallel computing in Python.
It's great, until you need to set it up.

Kubernetes? Cloud? HPC? SSH? YARN/Hadoop even?
What's the right deployment technology to choose?

After you set it up a new set of problems arise:

  • How do you install software across the cluster?
  • How do you secure network access?
  • How do you access secure data that needs credentials?
  • How do you track who uses it and constrain costs?
  • When things break, how do you track them down?

There exist solutions to these problems in open source packages like dask-kubernetes, helm charts, dask-cloudprovider, and dask-gateway, as well as commercially supported products like Coiled, Saturn, QHub, AWS EMR, and GCP Dataproc. How do we choose?

This talk describes the problem faced by people trying to deploy any distributed computing system, and tries to construct a framework to help them make decisions on how to deploy.


Prior Knowledge Expected

No previous knowledge expected

Matthew is a long time open source software developer in the Python data ecosystem. He’s worked on several libraries, but is primarily known for his work on Dask, a library for parallel computing in Python. Matthew started working on Dask at Anaconda, then moved to NVIDIA, and then finally built his own company around Dask named Coiled.