PyData Global 2022

Single node shared memory comes to dask
12-01, 18:30–19:00 (UTC), Talk Track II

The Ray project has show that having a shared memory facility greatly helps in certain compute problems, particularly where the job can be performed on a single large machine as opposed to a cluster. We present preliminary work showing that Dask can also achieve the same benefits.


Because of the global interpreter lock in python, it is not possible to parallelise all jobs across the threads of a single process. This is a real shame, because within a process, objects can be shared between threads by reference, without copy. So a typical dask cluster will consist of a number of processes with a few threads each, even on a large single node system with enough global memory for the problem at hand. This means expensive duplication of objects across processes, wasting memory resources and CPU time in serialisation and interprocess communication. As very large machines are as easy to obtain as clusters of smaller ones in today's cloud computing environments, Dask needs to better service such large single-node jobs by providing shared memory.

One of the strengths of the Ray compute model, is the ability to use posix shared memory across processes, via the apache arrow plasma library. Although it adds complexity the the system, the savings on memory usage , serialisation and communication are significant enough that this is seen as one of the real advantages of Ray.

In this talk, we will present prototype shared memory solutions for dask. We will demonstrate potential backends: lmbd, plasma, vineyard and multiprocessing.SharedMemory, and show that we really do achieve the memory use and efficiency savings envisioned. We will compare the performance and features of the two proposed backends.

Finally, we will look ahead to the possible extension to shared memory on multi-node systems, and possible smart spilling that a central memory manager makes possible.


Prior Knowledge Expected

No previous knowledge expected

Staff Software Engineer