PyData Global 2022

Parallelization of code in Python for beginners
12-01, 18:00–18:30 (UTC), Talk Track II

Stuck with long-running code that takes too long to complete, if ever? Learn to think strategically about parallelizing your workflows, including the characteristics that make a workflow a good candidate for parallelization as well as the options in python for executing parallelization. The talk eschews PySpark or other big data platforms.


Data scientists and engineers manipulating data in python can run into performance issues with executing calculations in sequence across a large dataset. One may desire quicker runtime to facilitate rapid prototyping, or may face compute resource shortages when processing a single large chunk of data all at once. This conceptually-oriented talk offers solutions to speed up slow-running workflows, as well as unblock constraints around computational resource capacity.

We explain the benefits of parallelized computing methods from a beginner’s perspective. We describe computing patterns for parallel computing, as well as how these match to practitioners’ workflows. Finally, we tie this in with working examples using the python package joblib, to show how this tool can be used to facilitate parallel computing. Along the way, we highlight typical examples matching problem to solution that should resonate with data scientists and engineers.

The goal of this talk is to empower relative beginners with fundamental knowledge about basic parallel computing concepts that can enable them to be more productive, as well as to understand why their code performance degrades and what levers exist to tune performance.

This talk is appropriate for analytics practitioners who have medium to large data needs, but that don’t want to deal with big data platforms just yet.


Prior Knowledge Expected

No previous knowledge expected

Cheryl has worked in the data science and predictive modeling field for over ten years, and has a background in computer science, applied math, and actuarial science.