Takuya Ueshin PyData Global 2022

Takuya Ueshin
.ical

Takuya Ueshin is a software engineer at Databricks, and an Apache Spark committer and a PMC member. His main interests are in Spark SQL internals as well as PySpark. He is one of the major contributors of pandas API on Spark, a.k.a. the Koalas project.

Sessions

12-02

20:00

30min

Scale Data Science by Pandas API on Spark

Xinrong Meng, Takuya Ueshin

With Python emerging as the primary language for data science, pandas has grown rapidly to become one of the standard data science libraries. One of the known limitations in pandas is that it does not scale with your data volume linearly due to single-machine processing.
Pandas API on Spark overcomes the limitation, enabling users to work with large datasets by leveraging Apache Spark. In this talk, we will introduce Pandas API on Spark and help you scale your existing data science workloads using that. Furthermore, we will share the cutting-edge features in Pandas API on Spark.

Talk Track I

Takuya Ueshin .ical

Sessions

Takuya Ueshin
.ical