PyData Global 2022

Sammy Sidhu

Sammy Sidhu is co-founder and CEO of Eventual. Sammy's background is in High Performance Computing (HPC) and Deep Learning and has over a dozen patents/publications in the space. In the past, he has worked on high frequency trading on wall street, medical AI research at Berkeley and self-driving cars at both DeepScale (acquired by Tesla) and Lyft Level 5 (acquired by Toyota). Native to the Bay Area, Sammy graduated from UC Berkeley with a degree in Electrical Engineering and Computer Science

The speaker's profile picture

Sessions

12-02
19:30
30min
Daft: the Distributed Python Dataframe for "Complex Data" (images, video, documents and more)
Jay Chia, Sammy Sidhu

Daft is an open-sourced distributed dataframe library built for "Complex Data" (data that doesn't usually fit in a SQL table such as images, videos, documents etc).

Experiment Locally, Scale Up in the Cloud

Daft grows with you and is built to run just as efficiently/seamlessly in a notebook on your laptop or on a Ray cluster consisting of thousands of machines with GPUs.

Pythonic

Daft lets you have tables of any Python object such as images/audio/documents/genomic files. This makes it really easy to process your Complex Data alongside all your regular tabular data. Daft is dynamically typed and built for fast iteration, experimentation and productionization.

Blazing Fast

Daft is built for distributed computing and fully utilizes your all of your machine's or cluster's resources. It uses modern technologies such as Apache Arrow, Parquet and Iceberg for optimizing data serialization and transport.

Talk Track II