BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//global2022.pydata.org//
BEGIN:VEVENT
UID:pretalx-cfp-7BCESD@global2022.pydata.org
DTSTART:20221201T080000Z
DTEND:20221201T083000Z
DESCRIPTION:Counterfactual explanations (CFE) are methods that\nexplain a m
achine learning model by giving an alternate class prediction\nof a data p
oint with some minimal changes in its features.\nIn this talk\, we describ
e a counterfactual (CF)\ngeneration method based on particle swarm optimiz
ation (PSO) and how we can have greater control over the proximity and spa
rsity properties\nover the generated CFs.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Generate Actionable Counterfactuals using Multi-objective Particle
Swarm Optimization - Niranjan G S\, SHASHANK SHEKHAR
URL:https://global2022.pydata.org/cfp/talk/7BCESD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-BPFCBT@global2022.pydata.org
DTSTART:20221201T080000Z
DTEND:20221201T083000Z
DESCRIPTION:This talk is about the approach we've taken at the Apache Airfl
ow for managing our dependencies at scale of a project that is the most po
pular Data Orchestrator in the world\, consists of ~ 80 independent packag
e and has more than 650 depenencies in total (and did not loose our sanity
).
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Managing Python Dependencies at scale - Jarek Potiuk
URL:https://global2022.pydata.org/cfp/talk/BPFCBT/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-RR3A9Y@global2022.pydata.org
DTSTART:20221201T083000Z
DTEND:20221201T090000Z
DESCRIPTION:For enterprises to adopt and embrace AI into their transformati
onal journey\, it is imperative to build Trustworthy AI- so that AI produ
cts and solutions that are built\, delivered\, and acquired are responsibl
e enough to drive trust and wider adoption. We look at AI Trust as a funct
ion of 4 key constructs which include Reliability\, Safety\, Transparency\
, Responsibility and Accountability. These core constructs are pillars of
driving AI trust in our products and solutions. In this talk\, I will expl
ain how to enable each core construct and will articulate how they can be
measured in some real-world use cases.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Measurement of Trust in AI - SHASHANK SHEKHAR
URL:https://global2022.pydata.org/cfp/talk/RR3A9Y/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-S7T7VG@global2022.pydata.org
DTSTART:20221201T083000Z
DTEND:20221201T090000Z
DESCRIPTION:When your goal of the study is to analyze and forecast volatili
ty\, this is where the ARCH/GARCH models comes into the picture to solve t
he complicated time series problems.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:ARCH/GARCH Models Tour - Kalyan Prasad
URL:https://global2022.pydata.org/cfp/talk/S7T7VG/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-ZEZKSQ@global2022.pydata.org
DTSTART:20221201T090000Z
DTEND:20221201T093000Z
DESCRIPTION:Have you ever trained an awesome model just to have it break in
production because of a null value? At its core a feature store needs to
provide reliable features to data scientists to build and productionize mo
dels. So how can we avoid garbage in\, garbage out situations? Great expec
tations is the most popular library for data validation\, and so the two a
re a natural fit. In this talk we will touch briefly upon different Python
data validation libraries such as Pydantic\, Pandera but then dive deeper
into Great Expectations’ concepts and how you can leverage them in feat
ure pipelines powering a feature store.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Data Validation for Feature pipelines: Using Great Expectations and
Hopsworks - Moritz Meister
URL:https://global2022.pydata.org/cfp/talk/ZEZKSQ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-TWB8CA@global2022.pydata.org
DTSTART:20221201T090000Z
DTEND:20221201T093000Z
DESCRIPTION:The pandas library is one of the key factors that enabled the g
rowth of Python in the Data Science industry and continues to help data sc
ientists thrive almost 15 years after its creation. Because of this succes
s\, nowadays several open-source projects claim to improve pandas in vario
us ways\, either by bringing it to a distributed computing setting (Dask)\
, accelerating its performance with minimal changes (Modin)\, or offering
slightly different API that solves some of its shortcomings (Polars).\n\nI
n this talk we will dive into Polars\, a new dataframe library backed by A
rrow and Rust that offers an expressive API for dataframe manipulation wit
h excellent performance.\n\nIf you are a seasoned pandas user willing to e
xplore alternatives\, or a beginner user wondering what all the fuzz about
these new dataframe libraries is\, this talk is for you!
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Expressive and fast dataframes in Python with polars - Juan Luis Ca
no Rodríguez
URL:https://global2022.pydata.org/cfp/talk/TWB8CA/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CLKMWR@global2022.pydata.org
DTSTART:20221201T093000Z
DTEND:20221201T110000Z
DESCRIPTION:One of the key questions in modern data science and machine lea
rning\, for businesses and practitioners alike\, is how do you move machin
e learning projects from prototype and experiment to production as a repea
table process. In this workshop\, we present a hands-on introduction to th
e landscape of production-grade tools\, techniques\, and workflows that br
idge the gap between laptop data science and production ML workflows. Part
icipants will learn how to take common machine learning models\, such as t
hose from scikit-learn\, XGBoost\, and Keras\, and productionize them usin
g Metaflow.\n\nWe’ll present a high-level overview of the 8 layers of th
e ML stack: data\, compute\, versioning\, orchestration\, software archite
cture\, model operations\, feature engineering\, and model development. We
’ll present a schematic as to which layers data scientists need to be th
inking about and working with\, and then introduce attendees to the toolin
g and workflow landscape. In doing so\, we’ll present a widely applicabl
e stack that provides the best possible user experience for data scientist
s\, allowing them to focus on parts they like (modeling using their favori
te off-the-shelf libraries) while providing robust built-in solutions for
the foundational infrastructure.\n\nYou can find the companion repository
for the workshop here: https://github.com/outerbounds/full-stack-ML-metafl
ow-tutorial.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Full-stack Machine Learning for Data Scientists - Hugo Bowne-Anders
on
URL:https://global2022.pydata.org/cfp/talk/CLKMWR/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-ZNMNCX@global2022.pydata.org
DTSTART:20221201T100000Z
DTEND:20221201T103000Z
DESCRIPTION:Inequality joins are less frequent than equality joins\, but ar
e useful in temporal analytics and even in some conventional applications.
Pyjanitor fills this gap in Pandas with an efficient implementation
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Inequality Joins in Pandas with Pyjanitor - samuel oranyeli
URL:https://global2022.pydata.org/cfp/talk/ZNMNCX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-L9PMLX@global2022.pydata.org
DTSTART:20221201T100000Z
DTEND:20221201T103000Z
DESCRIPTION:Are you fascinated by the real-life images or text produced by
deep generative models but cannot interpret their underlying data generati
on process or see how they can be applied to other problems? I will talk a
bout generative simulations built using knowledge of the problem domain th
at can produce realistic data in a variety of scenarios. This talk will be
a Bayesian thinking exercise cum data science case study of product star
rating timeseries from an online marketplace (like Amazon.com) – I will
show how we use recent advances in likelihood-free Bayesian inference toge
ther with a detailed simulation of an online marketplace to directly infer
factors involved in how customers purchase and rate products.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Interpretable and realistic generative models in data science? Like
lihood-free Bayes’ says yes! - Narendra Mukherjee
URL:https://global2022.pydata.org/cfp/talk/L9PMLX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-WHBJL9@global2022.pydata.org
DTSTART:20221201T103000Z
DTEND:20221201T110000Z
DESCRIPTION:It’s crunchy! It’s sweet! Maybe it is the presence of the n
uts or their absence. There are various features that make you favor a par
ticular cereal. Now surely\, if we modeled the consumer ratings for cereal
s\, some features would be considered more important than others. After al
l\, feature engineering is one of the most critical steps in modeling. But
after the model is up and running\, what if we tweak the features just to
see how much meddling can affect the preference? This process is called p
ost-hoc feature attribution and it seeks to interpret the model behavior.
In this talk\, let us spoon through the interpretability of ML models.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Explaining Why You have a Favorite Cereal - Gatha
URL:https://global2022.pydata.org/cfp/talk/WHBJL9/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-YBVUM7@global2022.pydata.org
DTSTART:20221201T103000Z
DTEND:20221201T110000Z
DESCRIPTION:Data Centric AI is about iterating on *data* instead of models
to improve machine learning predictions. Why is this trend relevant *now*?
Is this yet another hype in data science? Or has something really changed
? And most of all -- how is this relevant to *you*?
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Data-Centric AI Cookbook: let's prep that data - Marysia Winkels
URL:https://global2022.pydata.org/cfp/talk/YBVUM7/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CSJBCY@global2022.pydata.org
DTSTART:20221201T110000Z
DTEND:20221201T113000Z
DESCRIPTION:This session will discuss scaling your PyTorch models on TPUs.
We’ll also cover an overview of ML accelerators and distributed training
strategies. We’ll cover training on TPUs from beginning to end\, includ
ing setting them up\, TPU architecture\, frequently faced issues\, and deb
ugging techniques. You’ll learn about the experience of using the PyTorc
h XLA library and explore best practices for getting started with training
large-scale models on TPUs.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Supercharge your training on TPUs - Kaushik Bokka
URL:https://global2022.pydata.org/cfp/talk/CSJBCY/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GQSGAD@global2022.pydata.org
DTSTART:20221201T113000Z
DTEND:20221201T130000Z
DESCRIPTION:Build data pipelines using Trino and dbt\, combining heterogene
ous data sources without having to copy everything into a single system. M
anage access to your data products using modern and flexible security prin
ciples from authentication methods to fine-grained access control. Run and
monitor your data pipelines using Dagster.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:Building Data Products in a Lakehouse using Trino\, dbt\, and Dagst
er - Przemysław Denkiewicz\, Michiel De Smet
URL:https://global2022.pydata.org/cfp/talk/GQSGAD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-3TXUMK@global2022.pydata.org
DTSTART:20221201T113000Z
DTEND:20221201T130000Z
DESCRIPTION:sktime is a widely used scikit-learn compatible library for lea
rning with time series. sktime is easily extensible by anyone\, and intero
perable with the pydata/numfocus stack. sktime has a rich framework for b
uilding pipelines across multiple learning tasks that it supports\, includ
ing forecasting\, time series classification\, regression\, clustering. T
his tutorial explains basic and advanced sktime pipeline constructs\, and
introduces in detail the time series transformer which is the main compone
nt in all types of pipelines. It is a continuation of the sktime introduct
ory tutorial at pydata global 2021.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:sktime - python toolbox for time series: pipelines and transformers
- Franz Kiraly\, Benedikt Heidrich\, Mirae L Parker\, Martin Walter
URL:https://global2022.pydata.org/cfp/talk/3TXUMK/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-3HJRZM@global2022.pydata.org
DTSTART:20221201T113000Z
DTEND:20221201T120000Z
DESCRIPTION:Meaningful probabilistic models do not only produce a “best g
uess” for the target\, but also convey their uncertainty\, i.e.\, a beli
ef in how the target is distributed around the predicted estimate. Busines
s evaluation metrics such as mean absolute error\, a priori\, neglect that
unavoidable uncertainty. This talk discusses why and how to account for u
ncertainty when evaluating models using traditional business metrics\, usi
ng python standard tooling. The resulting uncertainty-aware model rating s
atisfies the requirements of statisticians because it accounts for the pro
babilistic process that generates the target. It should please practitione
rs because it is based on established business metrics. It appeases execut
ives because it allows concrete quantitative goals and non-defensive judge
ments.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Knowing what you don’t know matters: Uncertainty-aware model rati
ng - Malte Tichy
URL:https://global2022.pydata.org/cfp/talk/3HJRZM/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-LHKSG7@global2022.pydata.org
DTSTART:20221201T113000Z
DTEND:20221201T120000Z
DESCRIPTION:Hello wait you talk see to can’t all my in!\n\nSounds weird\,
right?! Detecting abnormal sequences is a common problem.\nJoin my talk t
o see how this problem involves Bert\, Word2vec\, and Autoencoders\, and l
earn how you can also apply it to information security
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Detecting anomalous sequences using text processing methods - Liron
Faybish
URL:https://global2022.pydata.org/cfp/talk/LHKSG7/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-T3N9MP@global2022.pydata.org
DTSTART:20221201T120000Z
DTEND:20221201T123000Z
DESCRIPTION:Numerous tools generate "explanations" for the outputs of machi
ne-learning models and similarly complex AI systems. However\, such “exp
lanations” are prone to misinterpretation and often fail to enable data
scientists or end-users to assess and scrutinize “an AI.” We share bes
t practices for implementing “explanations” that their human recipient
s understand.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Do You Follow What I’m Explaining? A Practitioner’s Guide to Op
ening the AI Black Box for Humans - Kilian Kluge
URL:https://global2022.pydata.org/cfp/talk/T3N9MP/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-VJ3T8A@global2022.pydata.org
DTSTART:20221201T120000Z
DTEND:20221201T123000Z
DESCRIPTION:This talk will show you how to build papermill plugins. As moti
vating examples\, we'll describe how to customize papermill for notebook d
ebugging and profiling.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Teaching papermill new tricks: creating custom engines for flexible
notebook execution - Eduardo Blancas
URL:https://global2022.pydata.org/cfp/talk/VJ3T8A/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-QK7B9M@global2022.pydata.org
DTSTART:20221201T123000Z
DTEND:20221201T130000Z
DESCRIPTION:Data is everywhere. It is through analysis and visualization th
at we are able to turn data into *information* that can be used to drive b
etter decision making. Out-of-the-box tools will allow you to create a cha
rt\, but if you want people to take action\, your numbers need to tell a c
ompelling story. Learn how elements of storytelling can be applied to data
visualization.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Data Storytelling through Visualization - Marysia Winkels
URL:https://global2022.pydata.org/cfp/talk/QK7B9M/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DQSXAX@global2022.pydata.org
DTSTART:20221201T123000Z
DTEND:20221201T130000Z
DESCRIPTION:In this talk\, I’d be talking about Zarr\, an open-source dat
a format for storing chunked\, compressed N-dimensional arrays. This talk
presents a systematic approach to understanding and implementing Zarr by s
howing how it works\, the need for using it\, and a hands-on session at th
e end. Zarr is based on an open technical specification\, making implement
ations across several languages possible. I’d be mainly talking about Za
rr’s Python implementation and would show how it beautifully interoperat
es with the existing libraries in the PyData stack.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:The Beauty of Zarr - Sanket Verma
URL:https://global2022.pydata.org/cfp/talk/DQSXAX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-HSHG88@global2022.pydata.org
DTSTART:20221201T130000Z
DTEND:20221201T133000Z
DESCRIPTION:Correlation does not imply causation. It turns out\, however\,
that with some simple ingenious tricks one can unveil causal relationships
within standard observational data\, without having to resort to expensiv
e randomised control trials. Learn how to make the most out of your data\,
avoid misinterpretation pitfalls and draw more meaningful conclusions by
adding causal inference to your toolbox.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Start asking your data “Why?” - A Gentle Introduction To Causal
Inference - Eyal Kazin
URL:https://global2022.pydata.org/cfp/talk/HSHG88/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-9GYEJB@global2022.pydata.org
DTSTART:20221201T133000Z
DTEND:20221201T140000Z
DESCRIPTION:Getting your team to choose good projects\, reliably derisk the
m\, research ideas\, productionise the solutions and create positive chang
e in an organisation is hard. Really hard.\nI'll present patterns that wor
k for these 5 critical project stages. This guidance is based on 15 years
of experience writing AI and DS solutions and 5 years giving both strategi
c guidance training on how to get to success.\nYou'll come away from the s
ession with new techniques to help your team deliver successfully and incr
ease their confidence in the roadmap\, new thoughts on how to diagnose you
r model's quality and new ideas to make positive difference in your organi
sation.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Data Science Project Patterns that Work - Ian Ozsvald
URL:https://global2022.pydata.org/cfp/talk/9GYEJB/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CCLNVD@global2022.pydata.org
DTSTART:20221201T133000Z
DTEND:20221201T140000Z
DESCRIPTION:The talk includes the presentation of Crowd-Kit - an open-sourc
e computational quality control library - followed by its demonstration.\n
Crowdsourced annotations in most cases require post-processing due to thei
r heterogeneous nature\; raw data contains errors\, is biased and non-triv
ial to combine. Crowd-Kit provides various methods like aggregation\, unce
rtainty\, and agreements\, which could be used as helping tools in getting
an interpretable result out of data labeled with the help of crowdsourcin
g.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Crowd-Kit: A Scikit-Learn for Crowdsourced Annotations - Evgeniya
URL:https://global2022.pydata.org/cfp/talk/CCLNVD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-Q8NDNK@global2022.pydata.org
DTSTART:20221201T140000Z
DTEND:20221201T150000Z
DESCRIPTION:Ada is the Founder of She Code Africa (SCA).
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Keynote - Ada Nduka Oyom - Ada Nduka Oyom
URL:https://global2022.pydata.org/cfp/talk/Q8NDNK/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-FRFVYQ@global2022.pydata.org
DTSTART:20221201T150000Z
DTEND:20221201T153000Z
DESCRIPTION:In today’s digital age\, we use machine learning (ML) and art
ificial intelligence (AI) to solve problems and improve productivity and e
fficiency. Yet\, there’s risk in delegating decision-making power to alg
orithmically based systems: their workings are often opaque\, turning them
into uninterpretable “black boxes.”
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Algorithms at Scale: Raising Awareness on Latent Inequities in Our
Data - Dr. Lalitha Krishnamoorthy
URL:https://global2022.pydata.org/cfp/talk/FRFVYQ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-LRRXLV@global2022.pydata.org
DTSTART:20221201T150000Z
DTEND:20221201T163000Z
DESCRIPTION:This tutorial is a hands-on introduction to Bayesian Decision A
nalysis (BDA)\, which is a framework for using probability to guide decisi
on-making under uncertainty. I start with Bayes's Theorem\, which is the f
oundation of Bayesian statistics\, and work toward the Bayesian bandit str
ategy\, which is used for A/B testing\, medical tests\, and related applic
ations. For each step\, I provide a Jupyter notebook where you can run Pyt
hon code and work on exercises. In addition to the bandit strategy\, I sum
marize two other applications of BDA\, optimal bidding and deriving a deci
sion rule. Finally\, I suggest resources you can use to learn more.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Bayesian Decision Analysis - Allen Downey
URL:https://global2022.pydata.org/cfp/talk/LRRXLV/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-WQJPKJ@global2022.pydata.org
DTSTART:20221201T150000Z
DTEND:20221201T163000Z
DESCRIPTION:Bytewax is an open source\, Python native\, framework and distr
ibuted processing engine for processing data streams that makes it easy to
build everything from pipelines for anonymizing data to more sophisticate
d systems for fraud detection\, personalization\, and more. For this tutor
ial\, we will cover how you can use Bytewax and the Python library\, River
\, to build an online machine learning system that will detect anomalies i
n IoT data from streaming systems like Kafka and Redpanda. This tutorial i
s for data scientists\, data engineers\, and machine learning engineers in
terested in machine learning and streaming data. At the end of the tutoria
l session you will know how to:\n- run a streaming platform like Kafka or
Redpanda in a docker container\,\n- develop a Bytewax dataflow\n- run a Ri
ver anomaly detection algorithm to detect anomalous data\n\nThe tutorial m
aterial will be available via a GitHub Repo and the content will be covere
d in roughly the timeline shown below.\n\n- 0-10min - Introduction to stre
am processing and online machine learning\n- 10-30min - Setup streaming sy
stem and prepare the data\n- 30-60min - Write the Bytewax dataflow and ano
maly detector code\n- 60-90min - Tune the anomaly detector and run the Byt
ewax dataflow successfully.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:Anomaly Detection on Streaming Data in Python using Bytewax and Riv
er - Zander
URL:https://global2022.pydata.org/cfp/talk/WQJPKJ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DVEVPE@global2022.pydata.org
DTSTART:20221201T150000Z
DTEND:20221201T153000Z
DESCRIPTION:Recently Sam Gross\, the author of nogil fork on Python 3.9\, d
emonstrates the GIL can be removed. For scientific programs which use heav
y CPU-bound processes\, it could be a huge performance improvement. In thi
s talk\, we will see if this is true and compare the nogil version to the
original.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Trying No GIL on Scientific Programming - Cheuk Ting Ho
URL:https://global2022.pydata.org/cfp/talk/DVEVPE/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DP7GJC@global2022.pydata.org
DTSTART:20221201T150000Z
DTEND:20221201T160000Z
DESCRIPTION:Executives at PyData is a facilitated discussion session for le
aders on the challenges around designing and delivering successful project
s\, organizational communication\, product management and design\, hiring\
, and team growth.\n\n[Join here](https://numfocus-org.zoom.us/j/811736
13104?pwd=R1ZveFNkRit3ZnFDWkdlR1FGZFJEdz09)
DTSTAMP:20240328T151843Z
LOCATION:Community Events & Sponsor Sessions
SUMMARY:Executives at PyData - Ian Ozsvald\, Lauren Oldja\, Douglas Squirre
l
URL:https://global2022.pydata.org/cfp/talk/DP7GJC/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-9KFH9E@global2022.pydata.org
DTSTART:20221201T153000Z
DTEND:20221201T160000Z
DESCRIPTION:Dask is a framework for parallel computing in Python.\nIt's gre
at\, until you need to set it up.\n\nKubernetes? Cloud? HPC? SSH? YARN
/Hadoop even?\nWhat's the right deployment technology to choose?\n\nAfter
you set it up a new set of problems arise:\n\n- How do you install softw
are across the cluster?\n- How do you secure network access?\n- How do
you access secure data that needs credentials?\n- How do you track who
uses it and constrain costs?\n- When things break\, how do you track the
m down?\n\nThere exist solutions to these problems in open source packages
like dask-kubernetes\, helm charts\, dask-cloudprovider\, and dask-gatewa
y\, as well as commercially supported products like Coiled\, Saturn\, QHub
\, AWS EMR\, and GCP Dataproc. How do we choose?\n\nThis talk describes t
he problem faced by people trying to deploy *any* distributed computing sy
stem\, and tries to construct a framework to help them make decisions on h
ow to deploy.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Deploying Dask - Matthew Rocklin
URL:https://global2022.pydata.org/cfp/talk/9KFH9E/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DWNLKQ@global2022.pydata.org
DTSTART:20221201T160000Z
DTEND:20221201T163000Z
DESCRIPTION:Domain experts often need to create text classification models\
; however\, they may lack ML or coding expertise to do so. In this talk\,
we show how domain experts can create text classifiers without writing a s
ingle line of code through the open-source\, no-code Label Sleuth system (
[www.label-sleuth.org](https://www.label-sleuth.org))\; a system that comb
ines an intuitive labeling UI with active learning techniques and integrat
ed model training functionality. Finally\, we describe how the system can
also benefit more technical users\, such as data scientists\, and develope
rs\, who can customize it for more advanced usage.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Create text classifiers in a few hours using the open-source\, no-c
ode Label Sleuth system - Yannis Katsis\, Eyal Shnarch
URL:https://global2022.pydata.org/cfp/talk/DWNLKQ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-NFQFLS@global2022.pydata.org
DTSTART:20221201T160000Z
DTEND:20221201T163000Z
DESCRIPTION:We’re on a global mission to make open source software thrive
and be more sustainable—from supporting open source contributors in the
ir career paths with our Open Source Professional Network (OSPN) to helpin
g organizations transform their business with support from our vetted netw
ork of enterprise solution architects (ESA Network) to helping our clients
select the right open source software stack for their business challenge
by leveraging our AI-driven scoring system. Please join us during Sponsor
Open Hours to learn more and ask us anything about open source.
DTSTAMP:20240328T151843Z
LOCATION:Community Events & Sponsor Sessions
SUMMARY:OpenTeams’ AMA with Travis Oliphant\, Lalitha Krishnamoorthy & Fa
tma Tarlaci -
URL:https://global2022.pydata.org/cfp/talk/NFQFLS/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-PB8EVG@global2022.pydata.org
DTSTART:20221201T160000Z
DTEND:20221201T163000Z
DESCRIPTION:The Jupyter ecosystem has been undergoing many changes in the p
ast few years. While JupyterLab has been embraced by many\, there are stil
l many active users of Jupyter Notebook. With that in mind\, Jupyter devel
opers have been gearing up for the release of the updated Notebook 7 based
on JupyterLab components as outlined in the [Jupyter Enhancement Proposal
#79](https://jupyter.org/enhancement-proposals/79-notebook-v7/notebook-v7
.html). With this\, there are significant changes coming to Notebook 6\, o
f which the upcoming Notebook 6.5 is intended to be end-of-life\, and user
s installing Notebook will soon receive a version of the project that may
disrupt their workflows. In an effort to give users time to transition to
using the updated codebase\, the NbClassic project has been introduced. Nb
Classic is the Jupyter Server extension implementation of the classical no
tebook. NbClassic has also become the owner of the static assets for the c
lassical notebook\, and Notebook 6.5 depends on NbClassic to provide those
. \nThe aim of this talk is to:\n1. Reflect on the changes to the Jupyter
ecosystem with the introduction of NbClassic and Notebook 7.\n2. Address s
ome questions that may come up about NbClassic and Notebook 6.5\, as well
as some of those that may come up once Notebook 7 is released.\n3. Showcas
e the feasibility with which users can use the different front-ends NbClas
sic\, Notebook 7 and JupyterLab with a demo.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:An Evolving Jupyter Notebook - Rosio Reyes
URL:https://global2022.pydata.org/cfp/talk/PB8EVG/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-YJCYE3@global2022.pydata.org
DTSTART:20221201T163000Z
DTEND:20221201T170000Z
DESCRIPTION:Pandas’ current behavior on whether indexing returns a view o
r copy is confusing\, even for experienced users. But it doesn’t have to
be this way. We can make this aspect of pandas easier to grasp by simplif
ying the copy/view rules\, and at the same time make pandas more memory-ef
ficient. And get rid of the SettingWithCopyWarning.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:On copies and views: updating pandas' internals (a.k.a. “Getting
rid of the SettingWithCopyWarning”) - Joris Van den Bossche
URL:https://global2022.pydata.org/cfp/talk/YJCYE3/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-QUUNUE@global2022.pydata.org
DTSTART:20221201T163000Z
DTEND:20221201T170000Z
DESCRIPTION:PyScript has brought change to the Python and PyData eco-system
making it much easier to execute Python in the browser and opening the ro
ad for multiple possibilities that were not possible. The talk will explor
e what happened since we presented it and will talk about how PyScript can
change the way we do Data Science and many other things.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:PyScript and Data Science: a love story - Fabio Pliger
URL:https://global2022.pydata.org/cfp/talk/QUUNUE/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-G3KMVW@global2022.pydata.org
DTSTART:20221201T170000Z
DTEND:20221201T180000Z
DESCRIPTION:AI is the future of software development
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Keynote - Thomas Dohmke - Thomas Dohmke
URL:https://global2022.pydata.org/cfp/talk/G3KMVW/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-XTR833@global2022.pydata.org
DTSTART:20221201T180000Z
DTEND:20221201T183000Z
DESCRIPTION:Stuck with long-running code that takes too long to complete\,
if ever? Learn to think strategically about parallelizing your workflows\,
including the characteristics that make a workflow a good candidate for p
arallelization as well as the options in python for executing parallelizat
ion. The talk eschews PySpark or other big data platforms.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Parallelization of code in Python for beginners - Cheryl Roberts
URL:https://global2022.pydata.org/cfp/talk/XTR833/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-TGTEMT@global2022.pydata.org
DTSTART:20221201T180000Z
DTEND:20221201T183000Z
DESCRIPTION:Many Python data professionals work daily in JupyterLab or Note
book instances. What can a hacker do with access to that system? In this p
resentation\, I will introduce the threat model and show why Jupyter insta
nces are valuable targets. Next\, I will demonstrate several post-exploita
tion activities that someone may try to perform on systems hosting Jupyter
instances. We will conclude with some defensive strategies to minimize th
e likelihood and impact of these activities. This talk will help data scie
ntists and information technology professionals better understand the pers
pective of potential attackers operating in Jupyter environments to improv
e defensive awareness and behavior.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Mischief Managed: What hackers can do on your Jupyter instance - Jo
seph Lucas
URL:https://global2022.pydata.org/cfp/talk/TGTEMT/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-LUYPAE@global2022.pydata.org
DTSTART:20221201T183000Z
DTEND:20221201T190000Z
DESCRIPTION:Extracting the highly valuable data from unstructured text ofte
n results in hard-to-read\, brittle\, difficult-to-maintain code. The prob
lem is that using regular expressions directly embedded in the program con
trol flow does not provide the best level of abstraction. We propose a que
ry language (based on the tuple relational calculus) that facilitates data
extraction. Developers can explicitly express their intent declaratively\
, making their code much easier to write\, read\, and maintain.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Text to Data: Make Your Code Malleable\, Not Brittle - David Barret
t\, Martha L Escobar-Molano
URL:https://global2022.pydata.org/cfp/talk/LUYPAE/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-JSXXGE@global2022.pydata.org
DTSTART:20221201T183000Z
DTEND:20221201T190000Z
DESCRIPTION:The Ray project has show that having a shared memory facility g
reatly helps in certain compute problems\, particularly where the job can
be performed on a single large machine as opposed to a cluster. We present
preliminary work showing that Dask can also achieve the same benefits.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Single node shared memory comes to dask - Martin Durant
URL:https://global2022.pydata.org/cfp/talk/JSXXGE/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CRABSP@global2022.pydata.org
DTSTART:20221201T190000Z
DTEND:20221201T200000Z
DESCRIPTION:RStudio recently changed its name to Posit to reflect the fact
that we're already a company that does more than just R. Come along to thi
s talk to hear a few of the reasons that we love R\, and to learn about so
me of the open source tools we're working on for python.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Embracing multi-lingual
data science - Hadley Wickham
URL:https://global2022.pydata.org/cfp/talk/CRABSP/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-J7NHFJ@global2022.pydata.org
DTSTART:20221201T200000Z
DTEND:20221201T203000Z
DESCRIPTION:Apache Airflow is a foundational component of data platform orc
hestration at Shopify. In this talk\, we'll dive into the many performance
and reliability challenges we’ve encountered running Airflow at Shopify
’s scale\, our custom tooling\, and the new multi-instance architecture
we rolled out.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Running Apache Airflow at Scale - Jean-Martin Archer\, Michael Petr
o
URL:https://global2022.pydata.org/cfp/talk/J7NHFJ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-EWZ3H7@global2022.pydata.org
DTSTART:20221201T200000Z
DTEND:20221201T203000Z
DESCRIPTION:Machine Learning models designed to work with streaming systems
make decisions on new data points as they arrive. But there is a downside
: model decisions can't be easily changed later when the model is updated
with fresher data\, user feedback\, or freshly tuned hyperparameters. This
is often a blocker for anomaly detection\, recommender systems\, process
mining\, and human-in-the-loop planning. \n\nTo deal with this\, we'll dem
onstrate design patterns to easily express reactive data processing logic.
We will use [Pathway](https://pathway.com)\, a scalable data processing f
ramework built around a Python programming interface. Pathway is battle-te
sted with operational data in enterprise\, including graphs and event stre
ams in real-world supply chains\, and is now launching as open-core. \n\nY
ou will leave the talk with a thorough understanding of the practical engi
neering challenges behind reactive data processing with a Machine Learning
angle to it\, and the steps needed to overcome these challenges.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Reactive data processing in Python - Adrian Kosowski
URL:https://global2022.pydata.org/cfp/talk/EWZ3H7/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-RDJWH8@global2022.pydata.org
DTSTART:20221201T203000Z
DTEND:20221201T213000Z
DESCRIPTION:Apache Airflow is a foundational component of data platform orc
hestration at Shopify. Following the main talk\, this is a session is sche
duled for you to ask and discuss running Airflow at scale with Jean-Martin
Archer\, Staff Data Engineer at Shopify and Michael Petro\, Data Engineer
at Shopify
DTSTAMP:20240328T151843Z
LOCATION:Community Events & Sponsor Sessions
SUMMARY:Apache Airflow at Scale: Let's Discuss - Jean-Martin Archer\, Micha
el Petro
URL:https://global2022.pydata.org/cfp/talk/RDJWH8/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-AEWSCP@global2022.pydata.org
DTSTART:20221201T203000Z
DTEND:20221201T220000Z
DESCRIPTION:Lightning Talks are short 5-10 minute sessions presented by
community members on a variety of interesting topics.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Lightning Talks - Brian Skinn\, Kacper Łukawski\, Kurt Schelfthout
\, Richard Lee\, Allan Campopiano\, Eyal Kazin\, Ziheng Wang\, Caroline Ar
nold
URL:https://global2022.pydata.org/cfp/talk/AEWSCP/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-TJ9VQU@global2022.pydata.org
DTSTART:20221201T203000Z
DTEND:20221201T210000Z
DESCRIPTION:All languages are rich in prose and poetry. A lot of the litera
ture is inaccessible because of a lack of understanding of that language.
It is often difficult to appreciate a simple translation of a poem due to
gaps in cultural knowledge. A poem translated in the style of an author f
amiliar to the reader might help to both add cultural context for the read
er and capture the essence of the poem itself.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Urdu poems to Shakespearean English - Machine Translation - Sidra E
ffendi
URL:https://global2022.pydata.org/cfp/talk/TJ9VQU/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-7JEXSU@global2022.pydata.org
DTSTART:20221201T210000Z
DTEND:20221201T213000Z
DESCRIPTION:For video advertisers\, precisely hitting their ad performance
goals is critical. Undershooting on campaign viewability objectives means
spending money on ads that nobody watches\, while overshooting them can m
ean vastly reducing the available ad slots. At JW Player\, we combine pred
ictive models with PID controllers to tune decision thresholds and deliver
the maximum possible reach to our advertisers while hitting their goals.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Using feedback loops to tune predictive models in a video ad market
place - Emily Hopper
URL:https://global2022.pydata.org/cfp/talk/7JEXSU/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-AH9DJD@global2022.pydata.org
DTSTART:20221201T213000Z
DTEND:20221201T220000Z
DESCRIPTION:Data science practitioners have a saying that a 80% of their ti
me gets spent on data prep. Often this involves tools such as Pandas and J
upyter. Graph Data Science is similar\, except the data prep techniques ar
e highly specialized and computationally expensive. Moreover\, data prep f
or graphs is required **before** commercial tools such as graph databases
or visualization can be used effectively. This talk shows examples of data
prep for graphs. A progressive example illustrates the challenges plus te
chniques that leverage open source integrations with the PyData stack: Arr
ow/Parquet\, PSL\, Ray\, Keyvi\, Datasketch\, etc.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Data Prep for Graphs - Paco Nathan
URL:https://global2022.pydata.org/cfp/talk/AH9DJD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-F9AQCV@global2022.pydata.org
DTSTART:20221201T220000Z
DTEND:20221201T223000Z
DESCRIPTION:Introducing a new project\, Compute over Data (Bacalhau)\, to r
un any computation on decentralized data. No need to move large datasets &
all languages/data are supported. If you can run Docker/WASM\, you're in
the game!\nBacalhau is a decentralized public computation network that tak
es a job and moves it near where the data stored\, including across a dece
ntralized server network that stores data and runs jobs inside it. Bacalha
u runs the job near where data lives and eliminates data management for th
e user.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Revolutionizing the Big Data Age With Compute over Data - David Aro
nchick
URL:https://global2022.pydata.org/cfp/talk/F9AQCV/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-VKEWPE@global2022.pydata.org
DTSTART:20221201T220000Z
DTEND:20221201T223000Z
DESCRIPTION:Data science as a professional discipline is still in its infan
cy\, and our field lacks widespread technical norms around project organiz
ation\, collaboration\, and reproducibility. This is painful both for prac
titioners and their end users because disorganized analysis is bad analysi
s\, and bad analysis costs money and wastes time. This talk presents ten p
rinciples for correct and reproducible data science inheriting from softwa
re engineering’s seven decades of hard-earned lessons as well as numerou
s experiences with data science teams at organizations of all sizes. We mo
tivate these principles by looking at some hard truths about data science
“in the wild.”
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:The 10 commandments of reliable data science - Isaac Slavitt
URL:https://global2022.pydata.org/cfp/talk/VKEWPE/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-HH7MMC@global2022.pydata.org
DTSTART:20221201T223000Z
DTEND:20221201T230000Z
DESCRIPTION:KerasCV offers a complete set of APIs to train your own state-o
f-the-art\,\nproduction-grade object detection model. These APIs include
object detection specific\ndata augmentation techniques\, models\, and COC
O metrics.\n\nThis talk covers how to train a RetinaNet on your own datase
t using KerasCV
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Object Detection with KerasCV - Lucas Wood
URL:https://global2022.pydata.org/cfp/talk/HH7MMC/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-D9XHMC@global2022.pydata.org
DTSTART:20221201T223000Z
DTEND:20221201T230000Z
DESCRIPTION:Where are CA’s frequent\, high quality transit corridors? The
CA Public Resources Code defines it\, but it requires continued access of
the General Transit Specification Feed (GTFS) data and fairly complex geo
spatial processing. The Integrated Travel Project within Caltrans tackles
this by leveraging the combined powers of Dask and Python to make this dat
aset publicly available and updated monthly on the CA open data portal.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:The Dask at Hand: Using Dask to Speed up the High Quality Transit A
reas dataset for the CA Open Data Portal. - Tiffany Chu
URL:https://global2022.pydata.org/cfp/talk/D9XHMC/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-WYCBXN@global2022.pydata.org
DTSTART:20221202T080000Z
DTEND:20221202T093000Z
DESCRIPTION:Visual Studio Code is one of the most popular editors in the Py
thon and data science communities\, and the extension ecosystem makes it e
asy for users to easily customize their workspace for the tools and framew
orks they need.\nJupyter notebooks are one such popular tool\, and there a
re some really great features for working in notebooks that can reduce con
text switching\, enable multi-tool workflows\, and utilize powerful Python
IDE features in notebooks.\nThis tutorial is geared for all Jupyter Noteb
ook users\, who either have interest in or are regularly using VS Code.\nP
articipants will learn how to use some of the best VS Code features for Ju
pyter Notebooks\, as well as a bunch of other tips and tricks to run\, vis
ualize and share your notebooks in VS Code.\nSome familiarity with Jupyter
Notebooks is required\, but experience with VS Code is not necessary.\nMa
terials and sample notebooks for the tutorial will be hosted on GitHub\, w
hich participants will be able to launch in their browser in the VS Code e
ditor with GitHub Codespaces with no local setup.\nParticipants will also
be encouraged if they have VS Code installed locally that they can open on
e of their own notebooks and try out the features as we go along.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Level up you Jupyter Notebooks with VS Code - Sarah Kaiser (She/Her
)
URL:https://global2022.pydata.org/cfp/talk/WYCBXN/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CSPMJD@global2022.pydata.org
DTSTART:20221202T080000Z
DTEND:20221202T083000Z
DESCRIPTION:We present BastionAI\, a new framework for privacy-preserving d
eep learning leveraging secure enclaves and Differential Privacy. \nWe pro
vide promising first results on fine-tuning a BERT model on the SMS Spam C
ollection Data Set within a secure enclave with Differential Privacy.\nThe
library is available at https://github.com/mithril-security/bastionai.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:BastionAI: Towards an Easy-to-use Privacy-preserving Deep Learning
Framework - Daniel Huynh
URL:https://global2022.pydata.org/cfp/talk/CSPMJD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-XTYXRG@global2022.pydata.org
DTSTART:20221202T083000Z
DTEND:20221202T090000Z
DESCRIPTION:We like talking about production – one famous\, but probably
wrong statement about it is “87% of data science projects never make it
to production”.\n\nWhile giving a talk to a group of up-and-coming data
scientists\, a question that surprised me came up: \n\n**When you say “p
roduction”\, what exactly do you mean?**\n\nBuzzwords are great\, but al
l the cool kids know what production is\, right? Wrong.\n\nIn this talk\,
we’ll define what production actually means. I’ll present a first-prin
ciples\, step-by-step approach to thinking about deploying a model to prod
uction. We’ll talk about challenges you might face in each step\, and pr
ovide further reading if you want to dive deeper into each one.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:ML in Production – What does “Production” even mean? - Dean Pleban
URL:https://global2022.pydata.org/cfp/talk/XTYXRG/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-T9HHPK@global2022.pydata.org
DTSTART:20221202T090000Z
DTEND:20221202T093000Z
DESCRIPTION:Mostly\, people relate Artificial Intelligence to progress\, in
telligence and productivity. But with this comes unfair decisions\, biases
\, human workforce being replaced\, lack of privacy and security. And to m
ake matters worse\, a lot of these problems are specific to AI. This indic
ates that the rules and regulations in place are inadequate to deal with t
hem. Responsible AI comes into play in this situation. It seeks to resolve
these problems and establish AI system responsibility. In this talk I am
going to talk about What is Responsible AI\, Why is it needed\, How it can
be implemented\, What are the various frameworks for Responsible AI and W
hat is the Future?
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Responsible AI - What\, Why\, How and Future! - Dr. Sonal Kukreja
URL:https://global2022.pydata.org/cfp/talk/T9HHPK/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CW9ZZX@global2022.pydata.org
DTSTART:20221202T090000Z
DTEND:20221202T093000Z
DESCRIPTION:In hypothesis testing the stopping criterion for data collectio
n is a non-trivial question that puzzles many analysts. This is especially
true with sequential testing where demands for quick results may lead to
biassed ones. \n\nI show how the belief that Bayesian approaches magically
resolve this issue is misleading and how to obtain reliable outcomes by f
ocusing on sample precision as a goal.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Don't Stop 'til You Get Enough - Hypothesis Testing Stop Criterion
with “Precision Is The Goal” - Eyal Kazin
URL:https://global2022.pydata.org/cfp/talk/CW9ZZX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-3GZMQP@global2022.pydata.org
DTSTART:20221202T093000Z
DTEND:20221202T100000Z
DESCRIPTION:Data practitioners are typically forced to choose between tools
that are either easy to use (pandas) or highly scalable (Spark\, SQL..etc
.). Modin\, an open source project originally developed by researchers at
UC Berkeley\, is a highly scalable\, drop-in replacement for pandas. \n\nT
his talk will give an overview of Modin and practical examples on how to u
se it to effortlessly scale up your pandas workflows.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Supercharging your pandas workflows with Modin - Alejandro Herrera
URL:https://global2022.pydata.org/cfp/talk/3GZMQP/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CBK8RJ@global2022.pydata.org
DTSTART:20221202T100000Z
DTEND:20221202T103000Z
DESCRIPTION:Machine learning models degrade with time. You need to update a
nd retrain them regularly. However\, the decision on the maintenance appro
ach is often arbitrary\, and the models are simply retrained on a schedule
or after every new batch. This can lead to suboptimal performance or wast
ed resources. In this talk\, I will discuss how we can do better: from est
imating the speed of the model decay in advance to constructing a proper e
valuation set.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Why we do ML model retraining wrong\, and how to do better - Emeli
Dral
URL:https://global2022.pydata.org/cfp/talk/CBK8RJ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-BNUAL8@global2022.pydata.org
DTSTART:20221202T100000Z
DTEND:20221202T120000Z
DESCRIPTION:There is a rich ecosystem of libraries for Bayesian analysis in
Python and it is necessary to use multiple libraries at the same time to
use a Bayesian workflow\, from model creation to presenting results going
through sampling and model checking.\n\nThis working session aims to bring
together practitioners to discuss and address interoperability issues wit
hin the ecosystem. Attendees should expect a hands-on get together where t
hey will meet other Bayesian practitioners with whom to discuss the issues
faced and contribute to open source libraries with issues\, pull requests
and discussions.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Working session for the Bayesian Python Ecosystem - Oriol Abril Pla
URL:https://global2022.pydata.org/cfp/talk/BNUAL8/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-AQLSAY@global2022.pydata.org
DTSTART:20221202T100000Z
DTEND:20221202T103000Z
DESCRIPTION:Recent advances in natural language processing demonstrate the
capability of large-scale language models (such as GPT-3) to solve a varie
ty of NLP problems with zero shots shifting from supervised fine-tuning to
prompt engineering/tuning.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Building Large-scale\, Localized Language Models: From Data Prepara
tion to Training and Deployment to Production. - Miguel Martínez\, Meriem
Bendris
URL:https://global2022.pydata.org/cfp/talk/AQLSAY/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-AMGU99@global2022.pydata.org
DTSTART:20221202T100000Z
DTEND:20221202T113000Z
DESCRIPTION:Want to create beautiful and complex visualisations of your dat
a with concise code? Look no further than Seaborn\, Python’s fantastic p
lotting library which builds on the hugely popular Matplotlib package. Thi
s hands-on tutorial will provide you with all the necessary tools to commu
nicate your data insights with Seaborn.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:Data visualisation with Seaborn - Myles Mitchell\, Parisa Gregg
URL:https://global2022.pydata.org/cfp/talk/AMGU99/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-YQUZFW@global2022.pydata.org
DTSTART:20221202T103000Z
DTEND:20221202T110000Z
DESCRIPTION:OpenSearch is an open source document database with search and
aggregation superpowers\, based on Elasticsearch. This session covers how
to use OpenSearch to perform both simple and advanced searches on semi-str
uctured data such as a product database.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Super Search with OpenSearch and Python - Laysa Uchoa
URL:https://global2022.pydata.org/cfp/talk/YQUZFW/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GLJX3M@global2022.pydata.org
DTSTART:20221202T103000Z
DTEND:20221202T110000Z
DESCRIPTION:Python has many different packages that are useful for working
with different kinds of geographical data. This presentation will introduc
e several of these packages and show you how you can get started working w
ith geolocated information and presenting insights on maps.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Maps\, Maps\, Maps! - Geir Arne Hjelle
URL:https://global2022.pydata.org/cfp/talk/GLJX3M/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-XKTAWW@global2022.pydata.org
DTSTART:20221202T110000Z
DTEND:20221202T113000Z
DESCRIPTION:pandas has rapidly become one of the most popular tools for dat
a analysis\, but is limited by its inability to scale to large datasets. W
e developed Modin\, a scalable\, drop-in alternative to pandas\, that pres
erves the dynamic and flexible behavior of pandas dataframes while enhanci
ng the scalability.\n\nThis talk will walk you through our team’s resear
ch at UC Berkeley\, which enabled the development of Modin. We’ll also d
iscuss our latest publication at VLDB\, which covers a novel approach to p
arallelization and metadata management techniques for dataframes.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:How to maximally parallelize the entire pandas API - Rehan Durrani
URL:https://global2022.pydata.org/cfp/talk/XKTAWW/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-FULGZZ@global2022.pydata.org
DTSTART:20221202T113000Z
DTEND:20221202T120000Z
DESCRIPTION:Transformer models became state-of-the-art in natural language
processing. Word representations learned by these models offer great flexi
bility for many types of downstream tasks from classification to summariza
tion. Nonetheless\, these representations suffer from certain conditions t
hat impair their effectiveness. Researchers have demonstrated that BERT an
d GPT embeddings tend to cluster in a narrow cone of the embedding space w
hich leads to unwanted consequences (e.g. spurious similarities between un
related words). During the talk we’ll introduce SimCSE – a contrastive
learning method that helps to regularize the embeddings and reduce the pr
oblem of anisotropy. We will demonstrate how SimCSE can be implemented in
Python.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:BERT's Achilles' heel? Applying contrastive learning to fight aniso
tropy in language models. - Aleksander Molak
URL:https://global2022.pydata.org/cfp/talk/FULGZZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-M99HFZ@global2022.pydata.org
DTSTART:20221202T113000Z
DTEND:20221202T120000Z
DESCRIPTION:Understanding dependencies between features is crucial in the p
rocess of developing and interpreting black-box ML models. Mistreating or
neglecting this aspect can lead to incorrect conclusions and\, consequenti
ally\, sub-optimal or wrong decisions leading to financial losses or other
undesired outcomes. Many common approaches to explain ML models – as si
mple as feature importance or more advanced methods such as SHAP – can y
ield misleading results if mutual feature dependencies are not taken into
account. \n \nIn this talk we present FACET 2.0 - a new approach for glo
bal feature explanations using a new technique called SHAP vector projecti
on\, open-sourced at: https://github.com/BCG-Gamma/facet/.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Exploring Feature Redundancy and Synergy with FACET 2.0 - and Why Y
ou Need It to Interpret ML Models Correctly - Mateusz Sokół\, Jan Ittner
URL:https://global2022.pydata.org/cfp/talk/M99HFZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-FBLQAZ@global2022.pydata.org
DTSTART:20221202T120000Z
DTEND:20221202T123000Z
DESCRIPTION:Automatic testing for ML pipelines is hard. Part of the execute
d code is a model that was dynamically trained on a fresh batch of data\,
and silent failures are common. Therefore\, it’s problematic to use know
n methodologies such as automating tests for predefined edge cases and tra
cking code coverage.\nIn this talk we’ll discuss common pitfalls with ML
models\, and cover best practices for automatically validating them: What
should be tested in these pipelines? How can we verify that they'll behav
e as we expect once in production?\nWe’ll demonstrate how to automate te
sts for these scenarios and introduce a few open-source testing tools that
can aid the process.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:How to Properly Test ML Models & Data - Shir Chorev
URL:https://global2022.pydata.org/cfp/talk/FBLQAZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-3PFAEF@global2022.pydata.org
DTSTART:20221202T120000Z
DTEND:20221202T123000Z
DESCRIPTION:To develop mature data science\, machine learning\, and deep le
arning applications\, one must develop a large number of pipeline componen
ts\, such as data loading\, feature extraction\, and frequently a multitud
e of machine learning models.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Machine Learning Frameworks Interoperability - Christian Hundt\, Mi
guel Martínez
URL:https://global2022.pydata.org/cfp/talk/3PFAEF/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-ZKRAQY@global2022.pydata.org
DTSTART:20221202T120000Z
DTEND:20221202T133000Z
DESCRIPTION:Sharing and explaining the results of your analysis can be a lo
t easier and much more fun when you can create an animated story of the ch
arts containing your insights. [ipyvizzu-story](https://github.com/vizzuhq
/ipyvizzu-story) - a new open-source presentation tool for Jupyter & Datab
ricks notebooks and similar platforms - enables just that using a simple P
ython interface. \n\nIn this workshop\, one of the creators of ipyvizzu-st
ory introduces this tool and helps the audience take the first steps in ut
ilizing the power of animation in data storytelling. After the workshop\,
the members can build and present animated data stories independently.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:ipyvizzu-story - a new\, open-source charting tool to build\, creat
e and share animated data stories with Python in Jupyter - Peter Vidos
URL:https://global2022.pydata.org/cfp/talk/ZKRAQY/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-N3SV3B@global2022.pydata.org
DTSTART:20221202T123000Z
DTEND:20221202T130000Z
DESCRIPTION:Identifying the right tools to enable for high performance mach
ine learning may be overwhelming as the ecosystem continues to grow at bre
ak-neck speed. This becomes particularly emphasised when dealing with the
ever growingly popular large language and image generation models such as
GPT2\, OTP and DALL-E\, between others. In this session we will dive into
a practical showcase where we will be productionising the large image gene
ration model DALL-E\, and showcase some optimizations that can be introduc
ed as well as considerations as the use-cases scale. By the end of this se
ssion practitioners will be able to run their own DALL-E powered applicati
ons as well as integrate these with functionalities from other large langu
age models like GPT2\, etc. We will be leveraging key tools in the Python
ecosystem to achieve this\, including Pytorch\, HuggingFace\, FastAPI and
MLServer.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Industrial Strength DALLE-E: Scaling Complex Large Text & Image Mod
els - Alejandro Saucedo
URL:https://global2022.pydata.org/cfp/talk/N3SV3B/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DWNRP9@global2022.pydata.org
DTSTART:20221202T123000Z
DTEND:20221202T130000Z
DESCRIPTION:Most organisations habe implemented some kind of dashboard to m
onitor their data\, processes\, or business. However\, many dashboard solu
tions come with a caveat – either the licensing costs\, lack of transpar
ency in the workflows\, limited creativity\, or they cannot be connected t
o existing infrastructure. \nThis talk is aimed at Data Scientists\, Data
Engineers\, Data Practitioners and Managers struggling with choosing betwe
en a myriad of commercial dashboard solutions and DIY. We present how to c
reate your own dashboard using open-source Python technologies like FastAP
I\, SQLAlchemy\, and Celery and the challenges involved. We look back at t
he pitfalls and solutions we have worked on over the past 3 years. The goa
l is not to present our unique solution\, but to show how we can combine d
ifferent Python libraries to implement custom solutions to solve different
use cases. Attendees should be familiar with the basic concepts of web in
frastructure. Previous knowledge of any libraries is not required. We hope
to provide a starting point to build your custom dashboard solution using
open-source tooling.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Lessons Learned Building Our Own Dashboard Solution Using Open-Sour
ce Technologies - Jan Dix\, Zornitsa Manolova\, Dominik Jany\, Camille Koe
nders
URL:https://global2022.pydata.org/cfp/talk/DWNRP9/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-Y9VFDD@global2022.pydata.org
DTSTART:20221202T130000Z
DTEND:20221202T150000Z
DESCRIPTION:Numerous scientific disciplines have noticed a reproducibility
crisis of published results. While this important topic was being addresse
d\, the danger of non-reproducible and unsustainable research artefacts us
ing machine learning in science arose. The brunt of this has been avoided
by better education of reviewers who nowadays have the skills to spot insu
fficient validation practices. However\, there is more potential to furthe
r ease the review process\, improve collaboration and make results and mod
els available to fellow scientists. This workshop will teach practical les
sons that can be directly applied to elevate the quality of ML application
s in science by scientists.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Real-world Perspectives to Avoid the Worst Mistakes using Machine L
earning in Science - Jesper Dramsch\, Valerio Maggio\, Gemma Turon\, Mike
Walmsley\, Goku Mohandas
URL:https://global2022.pydata.org/cfp/talk/Y9VFDD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-JRLBJQ@global2022.pydata.org
DTSTART:20221202T130000Z
DTEND:20221202T133000Z
DESCRIPTION:Getting predictions from transformer models such as BERT requir
es two steps: first to query the tokenizer and then feed the outputs to th
e deep learning model itself. These two parts of the model are kept under
different class implementations in popular open source implementations lik
e Huggingface Transformers and Sentence-Transformers. This works well with
in Python but when one wants to put such a model in production or convert
it to more efficient formats like onnx that may be served by other languag
es such as JVM-based it is preferable and simpler (and less risky) to have
a single artifact that is directly queried. This talk builds on the popul
ar sentence-transformers library and shows how one can transform a sentenc
e-transformer model into a single tensorflow artifact that can be queried
with strings and is ready for serving. At the end of the talk the audience
will get a better understanding of the architecture of sentence-transform
ers and the required steps for converting a sentence-transformer model to
a single tensorflow graph. The code is released as a set of notebooks so t
hat the audience can replicate the results.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Converting sentence-transformers models to a single tensorflow grap
h - Georgios Balikas
URL:https://global2022.pydata.org/cfp/talk/JRLBJQ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-V9NFUV@global2022.pydata.org
DTSTART:20221202T130000Z
DTEND:20221202T133000Z
DESCRIPTION:Organisations have been growingly adopting and integrating a no
n-trivial number of different frameworks at each stage of their machine le
arning lifecycle. Although this has helped reduce time-to-value for real-w
orld AI use-cases\, it has come at a cost of complexity and interoperabili
ty bottlenecks.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Metadata Systems for End-to-End Data & Machine Learning Platforms -
Alejandro Saucedo
URL:https://global2022.pydata.org/cfp/talk/V9NFUV/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-Q3SBSF@global2022.pydata.org
DTSTART:20221202T133000Z
DTEND:20221202T140000Z
DESCRIPTION:A somewhat beginner's guide on running neural networks on micro
-controllers\, understanding the training pipeline\, deployment and how to
update the deployed model
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Things I learned running neural networks on microcontrollers - SARA
DINDU SENGUPTA
URL:https://global2022.pydata.org/cfp/talk/Q3SBSF/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-HDNA9X@global2022.pydata.org
DTSTART:20221202T133000Z
DTEND:20221202T140000Z
DESCRIPTION:Starting a new data science project is an exciting time\, full
of exotic models possibilities and faraway incredible features. However th
is ocean of potentialities is treacherous and the risks of veering off num
erous. \n\nThis talk aims to provide a checklist to help you set a course
for your data science project\, and keep it. An industrial project about i
mages pseudo-classification will be used as a working example.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Steering a data science project - Morgane Mahaud\, Morgane Mahaud
URL:https://global2022.pydata.org/cfp/talk/HDNA9X/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-FLQLXY@global2022.pydata.org
DTSTART:20221202T133000Z
DTEND:20221202T150000Z
DESCRIPTION:Have you ever wondered what it takes to build a production grad
e Machine Learning platform? With so many OSS tools and frameworks it can
get overwhelming at times how to make everything work. In this workshop we
will build a production grade Model training\, Model Serving\, Model Moni
toring platform on AWS EKS. Nothing will be local. These ideas can serve M
L Engineers\, Applied Data Scientists & Researchers to further extend them
and develop a holistic picture of building an ML Platform on OSS.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:Building a Machine Learning Platform with OSS in 90 min - Anindya S
aha
URL:https://global2022.pydata.org/cfp/talk/FLQLXY/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GEHBLR@global2022.pydata.org
DTSTART:20221202T140000Z
DTEND:20221202T143000Z
DESCRIPTION:The Awkward Array project provides a library for operating on n
ested\,\nvariable length data structures with NumPy-like idioms. We presen
t two\nprojects that provide native support for Awkward Arrays in the broa
der\nPyData ecosystem. In dask-awkward we have implemented a new Dask\ncol
lection to scale up and distribute workflows with partitioned\nAwkward Arr
ays. In awkward-pandas we have implemented a new Pandas\nextension array t
ype\, making it easy to use Awkward Arrays in Pandas\nworkflows and enabli
ng massive acceleration in the processing of\nnested data. We will show ho
w these projects plug into PyData and\npresent some compelling use cases.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Extending Awkward Array into the broader PyData Ecosystem - Doug Da
vis
URL:https://global2022.pydata.org/cfp/talk/GEHBLR/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-SZXLKG@global2022.pydata.org
DTSTART:20221202T140000Z
DTEND:20221202T143000Z
DESCRIPTION:The virtual chemical universe is expanding rapidly as open acce
ss titan databases Enamine Database (20 Billion)\, Zinc Database (2 Billio
n)\, PubMed Database (68 Million) and cheminformatic tools\nto process\, m
anipulate\, and derive new compound structures are being established. We p
resent our open source knowledge graph\, Global-Chem\, written in python t
o distribute dictionaries of common chemical lists of relevant to differen
t sub-communities out to the general public i.e What is inside Food? Canna
bis? Sex Products? Chemical Weapons? Narcotics? Medical Therapeutics? \n\n
To navigate new chemical space we use our data as a reference index as to
help us keep track of common patterns of interest and help us explore new
chemicals that could be theoretically real. In our talk\, we will present
the chemical data\, the rules governing the data and it's integrity\, and
how to use our tools to understand the chemical universe with python.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:The Pythonic Common Chemical Universe - Suliman Sharif
URL:https://global2022.pydata.org/cfp/talk/SZXLKG/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-VBG3LX@global2022.pydata.org
DTSTART:20221202T143000Z
DTEND:20221202T150000Z
DESCRIPTION:Machine learning operations (MLOps) are often synonymous with l
arge and complex applications\, but many MLOps practices help practitioner
s build better models\, regardless of the size. This talk shares best prac
tices for operationalizing a model and practical examples using the open-s
ource MLOps framework `vetiver` to version\, share\, deploy\, and monitor
models.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Practical MLOps for better models - Isabel Zimmerman
URL:https://global2022.pydata.org/cfp/talk/VBG3LX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-KQLTJ7@global2022.pydata.org
DTSTART:20221202T150000Z
DTEND:20221202T160000Z
DESCRIPTION:DJ Patil is the former U.S. Chief Data Scientist
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Keynote - DJ Patil - DJ Patil
URL:https://global2022.pydata.org/cfp/talk/KQLTJ7/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-7DUJCZ@global2022.pydata.org
DTSTART:20221202T160000Z
DTEND:20221202T163000Z
DESCRIPTION:We want to present Cleora – an open-source tool for creating
compact representation of the behavior of your client. Cleora uses graph t
heory to transform streams of event data into embedding. It is suitable as
an input for training models like churn\, propensity and recommender syst
ems. This is a talk useful for anyone who wishes to learn how to work with
event data of clients and how to model client's behavior.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:On creating behavioral profiles of your customer from event stream
data – introduction to Cleora\, the open-source tool for real time multi
modal modeling. - Dominika Basaj\, Barbara Rychalska
URL:https://global2022.pydata.org/cfp/talk/7DUJCZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-BHZWDC@global2022.pydata.org
DTSTART:20221202T160000Z
DTEND:20221202T180000Z
DESCRIPTION:In this workshop\, attendees will learn how to create data anno
tation guidelines from a user experience (UX) perspective.\n\nCreating ann
otation guidelines from a UX perspective means imbuing them with usability
\, resulting in a better experience for annotators\, and more effective an
d productive annotation campaigns. With Python being at the forefront of M
achine Learning and data science\, we believe that the Python community wi
ll benefit from learning more about the design of data annotation guidelin
es and why they are essential for creating great machine learning applicat
ions.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Data annotation for humans: Creating and refining annotation guidel
ines from a UX perspective - Damian Romero
URL:https://global2022.pydata.org/cfp/talk/BHZWDC/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GEULJ9@global2022.pydata.org
DTSTART:20221202T160000Z
DTEND:20221202T180000Z
DESCRIPTION:Nowadays we know the social media and tech giants are honesting
tons of data from their users and most of us agree that the capability of
these companies to deliver their suggestions and customization for you is
driven by big data.\n\nHowever\, this brings a question: Is more data alw
ays better? Do more data equal to more accurate model? When do you need b
ig data and when does it start becoming a bad idea? Let's find out in this
panel session.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:Too much data? When big data starts to become a bad idea - Cheuk Ti
ng Ho\, Jesper Dramsch\, Alexander CS Hendorf\, Katrina Riehl\, John Sanda
ll
URL:https://global2022.pydata.org/cfp/talk/GEULJ9/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-ZCJSRH@global2022.pydata.org
DTSTART:20221202T160000Z
DTEND:20221202T163000Z
DESCRIPTION:Data practitioners use distributed computing frameworks such as
Spark\, Dask\, and Ray to work with big data. One of the major pain point
s of these frameworks is testability. For testing simple code changes\, us
ers have to spin up local clusters\, which have a high overhead. In some c
ases\, code dependencies force testing against a cluster. Because testing
on big data is hard\, it becomes easy for practitioners to avoid testing e
ntirely. In this talk\, we’ll show best practices for testing big data a
pplications. By using Fugue to decouple logic and execution\, we can bring
more tests locally and make it easier for data practitioners to test with
low overhead.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Testing Big Data Applications (Spark\, Dask\, and Ray) - Han Wang
URL:https://global2022.pydata.org/cfp/talk/ZCJSRH/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-WQQQQY@global2022.pydata.org
DTSTART:20221202T163000Z
DTEND:20221202T170000Z
DESCRIPTION:Production workflows in machine learning has it's own requireme
nts compared to DevOps. In this talk\, I will present a new library we are
developing called "skops" that's built to improve production workflows fo
r scikit-learn models.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Improving production workflows for scikit-learn models with skops -
Merve Noyan
URL:https://global2022.pydata.org/cfp/talk/WQQQQY/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-E7BKUX@global2022.pydata.org
DTSTART:20221202T163000Z
DTEND:20221202T170000Z
DESCRIPTION:This talk will go into how Deep Learning is changing the world
of Cheminformatics. We will dive deep into how we can leverage traditional
NLP Transformer models can enable us to performing a totally uncorrelated
task such as Drug Discovery. This talk will give a brief introduction to
the field of Cheminformatics and then go into detail as to how and what ki
nd of Transformers can be utilized for the task at hand.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Let's Discover Drugs using Deep Learning - Rahul Baboota
URL:https://global2022.pydata.org/cfp/talk/E7BKUX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-RXGYC7@global2022.pydata.org
DTSTART:20221202T170000Z
DTEND:20221202T173000Z
DESCRIPTION:What if you're a two man machine learning team deploying models
to users? What if you don't have a full blown team of Data Engineers work
ing with you? What if nobody around you cares about making that nasty prod
uction data available in a pristine feature store? What if you don't even
have time to build out your entire Machine Learning platform? \n\nThere mu
st be a way to still deliver your ML model to users right? There must be w
ay to deliver value. \n\nIn this session\, I'll talk about how small team
s address the problem of delivering ML-value to users. At a reasonable sca
le. I'll go over some misconceptions and lessons-learned from 4 years work
ing with early-stage startups.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:MLOps for the rest of us: A poor man's guide to putting models in p
roduction - Duarte Carmo
URL:https://global2022.pydata.org/cfp/talk/RXGYC7/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-HPPMEJ@global2022.pydata.org
DTSTART:20221202T170000Z
DTEND:20221202T173000Z
DESCRIPTION:Moving data in and out of a warehouse is both tedious and time-
consuming. In this talk\, we will demonstrate a new approach using the Sno
wpark Python library. Snowpark for Python is a new interface for Snowflake
warehouses with Pythonic access that enables querying DataFrames without
having to use SQL strings\, using open-source packages\, and running your
model without moving your data out of the warehouse. We will discuss the f
ramework and showcase how data scientists can design and train a model end
-to-end\, upload it to a warehouse and append new predictions using notebo
oks.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Machine Learning in the Warehouse with Python - Allan Campopiano
URL:https://global2022.pydata.org/cfp/talk/HPPMEJ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-HFGGPM@global2022.pydata.org
DTSTART:20221202T173000Z
DTEND:20221202T180000Z
DESCRIPTION:Everyone who codes can save time by reusing configuration — w
hether for logging in to cloud providers or databases\, spinning up Docker
containers\, or sending notifications. The Prefect open source library pr
ovides you with blocks - sharable\, reusable\, and secure configuration wi
th code. Blocks can be created and edited through the Prefect UI or Python
code\, allowing for easier collaboration with team members of all skill l
evels.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Better Python Coding with Prefect Blocks - Jeff Hale
URL:https://global2022.pydata.org/cfp/talk/HFGGPM/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-XNUDTH@global2022.pydata.org
DTSTART:20221202T173000Z
DTEND:20221202T180000Z
DESCRIPTION:Model training is a time-consuming\, data-intensive\, and resou
rce-hungry phase in machine learning\, with much use of storage\, CPUs\, a
nd GPUs. The data access pattern in training requires frequent I/O of a ma
ssive number of small files\, such as images and audio files. With the adv
ancement of distributed training in the cloud\, it is challenging to maint
ain the I/O throughput to keep expensive GPUs highly utilized without wait
ing for access to data. The unique data access patterns and I/O challenges
associated with model training compared to traditional data analytics nec
essitate a change in the architecture of your data platform.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:How to Eliminate the I/O Bottleneck and Continuously Feed the GPU W
hile Training in the Cloud - Lu Qiu
URL:https://global2022.pydata.org/cfp/talk/XNUDTH/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-3VRJLZ@global2022.pydata.org
DTSTART:20221202T180000Z
DTEND:20221202T190000Z
DESCRIPTION:Gabriela de Queiroz is a Principal Cloud Advocate Manager at Mi
crosoft.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Keynote - Gabriela de Queiroz - Gabriela de Queiroz
URL:https://global2022.pydata.org/cfp/talk/3VRJLZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-KTEUCA@global2022.pydata.org
DTSTART:20221202T190000Z
DTEND:20221202T193000Z
DESCRIPTION:NetworkX is the most popular graph/network library in Python. I
t is easy to use\, well documented\, easy to contribute to\, extremely fle
xible\, and extremely slow for large graphs. \nAn upcoming release begins
to fix that last issue by calling fast GraphBLAS implementations instead o
f the native Python implementation.\n\nIf you use NetworkX or have ever wr
itten a graph algorithm\, this talk will be of interest to you as it shows
how NetworkX is planning on a path of pluggable algorithm libraries so us
ers can opt-in to faster implementations with minimal code changes.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:100x Faster NetworkX: Dispatching to GraphBLAS - Jim Kitchen\, Erik
Welch\, Mridul Seth
URL:https://global2022.pydata.org/cfp/talk/KTEUCA/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-UVFTLD@global2022.pydata.org
DTSTART:20221202T190000Z
DTEND:20221202T193000Z
DESCRIPTION:Data pipelines consist of graphs of computations that produce a
nd consume data assets like tables and ML models.\n\nData practitioners of
ten use workflow engines like Airflow to define and manage their data pipe
lines. But these tools are an odd fit - they schedule tasks\, but miss tha
t tasks are built to produce and maintain data assets. They struggle to re
present dependencies that are more complex than “run X after Y finishes
” and lose the trail on data lineage.\n\nDagster is an open-source frame
work and orchestrator built to help data practitioners develop\, test\, an
d run data pipelines. It takes a declarative approach to data orchestratio
n that starts with defining data assets that are supposed to exist and the
upstream data assets that they’re derived from.\n\nAttendees of this se
ssion will learn how to develop and maintain data pipelines in a way that
makes their datasets and ML models dramatically easier to trust and evolve
.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Data pipelines != workflows: orchestrating data with Dagster - Sand
y Ryza
URL:https://global2022.pydata.org/cfp/talk/UVFTLD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-U7ZHRW@global2022.pydata.org
DTSTART:20221202T190000Z
DTEND:20221202T203000Z
DESCRIPTION:Add to your machine learning arsenal with an introduction to si
mulation in Python using SimPy! Simulations are increasingly important in
machine learning\, with applications that include simulating the spread of
COVID-19 to make decisions about public policy\, vaccination and shutdown
s.\n\nYou can use simulation to answer questions like\, Can you increase p
rofits by adding more tables or staff to your restaurant? You can also use
simulation to create data for modeling when it's hard or impossible to ge
t (e.g. simulate purchases in response to promotions on certain products t
o see if they increase sales).\n\nTo benefit from this talk\, you'll need
to know a small amount of Python\, specifically how to write functions and
simple classes. No previous knowledge of simulation needed! If you know a
bout simulation in another language and want to see a SimPy example\, you
can also benefit from this talk. You'll get a Jupyter notebook with a simp
le but fully worked out example to follow along with and to study on your
own time after the conference.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Simulations in Python: Discrete Event Simulation with SimPy - Lara
Kattan
URL:https://global2022.pydata.org/cfp/talk/U7ZHRW/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-8DKYRH@global2022.pydata.org
DTSTART:20221202T190000Z
DTEND:20221202T203000Z
DESCRIPTION:Machine learning algorithms\, especially artificial neural netw
orks\, are not tolerant of missing data. Many practitioners simply remove
records with missing fields without any consideration for the potential s
tatistical bias that might be introduced. The field of imputation has beco
me mature with imputations not only predicting missing values\, but reflec
ting the uncertainty in the prediction. Traditional statistical estimators
make use of the full benefits offered by advanced imputation techniques.
This tutorial illustrates techniques and architectures that can incorporat
e advanced imputation techniques into machine learning pipelines including
artificial neural networks.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial II
SUMMARY:Missing Data in the Age of Machine Learning - Haw-minn Lu\, Haoyin
Xu
URL:https://global2022.pydata.org/cfp/talk/8DKYRH/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DV3ZGE@global2022.pydata.org
DTSTART:20221202T193000Z
DTEND:20221202T200000Z
DESCRIPTION:Vaex is an incredibly powerful DataFrame library that allows on
e to work with datasets much larger than RAM on a single node. It combines
memory mapping\, lazy evaluations\, efficient C++ algorithms\, and a vari
ety of other tricks to empower your off-the-shelf laptop and make it crunc
h through a billion samples in real time.\n\nA common use-case for Vaex is
as a backend for data apps\, especially if one needs to process\, transfo
rm\, and visualize a larger amount of data in real time. Vaex implements a
number of features that have been specifically designed to improve perfor
mance of data hungry dashboards or apps\, namely:\n - caching\n - async ev
aluations\n - early stopping of operations\n - progress bars\n\nIn this ta
lk we will showcase how you can use these features to build efficient dash
boards and data apps\, regardless of the data app library you prefer using
.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Vaex: the perfect DataFrame Library for Python data apps - Jovan Ve
ljanoski\, Maarten Breddels
URL:https://global2022.pydata.org/cfp/talk/DV3ZGE/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-P3J9JA@global2022.pydata.org
DTSTART:20221202T193000Z
DTEND:20221202T200000Z
DESCRIPTION:Daft is an open-sourced distributed dataframe library built for
"Complex Data" (data that doesn't usually fit in a SQL table such as imag
es\, videos\, documents etc). \n\n**Experiment Locally\, Scale Up in the C
loud**\n\nDaft grows with you and is built to run just as efficiently/seam
lessly in a notebook on your laptop or on a Ray cluster consisting of thou
sands of machines with GPUs.\n\n**Pythonic**\n\nDaft lets you have tables
of any Python object such as images/audio/documents/genomic files. This ma
kes it really easy to process your Complex Data alongside all your regular
tabular data. Daft is dynamically typed and built for fast iteration\, ex
perimentation and productionization.\n\n**Blazing Fast**\n\nDaft is built
for distributed computing and fully utilizes your all of your machine's or
cluster's resources. It uses modern technologies such as Apache Arrow\, P
arquet and Iceberg for optimizing data serialization and transport.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Daft: the Distributed Python Dataframe for "Complex Data" (images\,
video\, documents and more) - Jay Chia\, Sammy Sidhu
URL:https://global2022.pydata.org/cfp/talk/P3J9JA/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-U9XTNQ@global2022.pydata.org
DTSTART:20221202T200000Z
DTEND:20221202T203000Z
DESCRIPTION:With Python emerging as the primary language for data science\,
pandas has grown rapidly to become one of the standard data science libra
ries. One of the known limitations in pandas is that it does not scale wit
h your data volume linearly due to single-machine processing.\nPandas API
on Spark overcomes the limitation\, enabling users to work with large data
sets by leveraging Apache Spark. In this talk\, we will introduce Pandas A
PI on Spark and help you scale your existing data science workloads using
that. Furthermore\, we will share the cutting-edge features in Pandas API
on Spark.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Scale Data Science by Pandas API on Spark - Xinrong Meng\, Takuya U
eshin
URL:https://global2022.pydata.org/cfp/talk/U9XTNQ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CZBP8K@global2022.pydata.org
DTSTART:20221202T200000Z
DTEND:20221202T203000Z
DESCRIPTION:"It works on my machine"... those dreaded words. \n\n"I'm not a
developer\, I don't know how to test"... arghhh.\n\n"Let QA test it"....\
n\nNo more excuses. Learn how to debug and test Pandas code.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Testing Pandas: Shoots\, leaves\, and garbage! - Matt Harrison
URL:https://global2022.pydata.org/cfp/talk/CZBP8K/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-R7CWEZ@global2022.pydata.org
DTSTART:20221202T203000Z
DTEND:20221202T213000Z
DESCRIPTION:Quincy Larson is the Founder of freecodecamp.org.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Keynote - Quincy Larson - Quincy Larson
URL:https://global2022.pydata.org/cfp/talk/R7CWEZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-BWU9YV@global2022.pydata.org
DTSTART:20221202T213000Z
DTEND:20221202T220000Z
DESCRIPTION:Transformer models are all around in the deep learning communit
y and this talk will help to better understand why transformers achieve su
ch impressive results. Using various explainability techniques and plain n
umpy examples\, participants will gain an understanding of the attention m
echanism\, its implementation\, and how it all comes together.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Everything you need to know about Transformer Models - Mike Rothenh
äusler
URL:https://global2022.pydata.org/cfp/talk/BWU9YV/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-CYWTDZ@global2022.pydata.org
DTSTART:20221202T213000Z
DTEND:20221202T223000Z
DESCRIPTION:Lightning Talks are short 5-10 minute sessions presented by
community members on a variety of interesting topics.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Lightning Talks - Archit Khosla\, Josh Seltzer\, Cameron Devine PhD
\, Ray Bell\, Aidan Russell
URL:https://global2022.pydata.org/cfp/talk/CYWTDZ/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-VQRC77@global2022.pydata.org
DTSTART:20221202T220000Z
DTEND:20221202T223000Z
DESCRIPTION:This talk will show you a simple yet effective technique to vis
ualize larger-than-memory datasets on your laptop by leveraging SQLite or
DuckDB. No need to spin up a Spark cluster!
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:You don't need a cluster for that: using embedded SQL engines for p
lotting massive datasets on a laptop - Eduardo Blancas
URL:https://global2022.pydata.org/cfp/talk/VQRC77/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GU9AJ7@global2022.pydata.org
DTSTART:20221202T223000Z
DTEND:20221202T233000Z
DESCRIPTION:Join us for the traditional PyData Pub Quiz\, hosted by quizmas
ters James Powell and Cameron Riddell. The event is open to everyone and w
ill be located in Gather.
DTSTAMP:20240328T151843Z
LOCATION:Community Events & Sponsor Sessions
SUMMARY:PyData Pub Quiz - Quizmaster James Powell\, Cameron Riddell
URL:https://global2022.pydata.org/cfp/talk/GU9AJ7/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GMPTUX@global2022.pydata.org
DTSTART:20221203T080000Z
DTEND:20221203T083000Z
DESCRIPTION:Why is the process of transforming research into a “real worl
d” product so full of question marks? We often know where the research j
ourney starts but have uncertainty about how and WHEN it ends.\n\nIn this
talk\, I will share my own experience leading algorithmic teams through th
e cycle of research into the production of live-streaming AI products. I w
ill also share how to mitigate between agile incremental delivery and gian
t leaps forward that require longer research. How understanding the minimu
m viable product (MVP) way of thinking can help not only managers but ever
y developer. Learn to outline MVP for new AI capabilities\, and move forwa
rd with production in mind\, while always raising the quality standards. A
t the end of this session\, you will get the boost you need to take the da
ta-driven experimental mindset to the next level\, spiced with methodologi
es you can adapt to development as well as research.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Bon Voyage! Leading machine learning research journeys with happy (
into-production) endings - Topaz Gilad
URL:https://global2022.pydata.org/cfp/talk/GMPTUX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GQXZ93@global2022.pydata.org
DTSTART:20221203T083000Z
DTEND:20221203T090000Z
DESCRIPTION:The value of an ML model is not realized until it is deployed a
nd served in production. Building an ML application is more challenging co
mpared to a traditional application due to the added complexities from mod
els and data in addition to the application code. Using web serving framew
orks (e.g. FastAPI) can work for the simple cases but falls short for perf
ormance and efficiency. Alternatively\, using pre-packaged models servers
(e.g. Triton Inference Server) can be ideal for low-latency serving and re
source utilization but lacks flexibility in defining custom logic and depe
ndency. BentoML abstracts the complexities by creating separate runtimes f
or IO-intensive preprocessing logic and compute-intensive model inference
logic. Simultaneously\, BentoML offers an intuitive and flexible Python-fi
rst SDK for defining custom preprocessing logic\, orchestrating multi-mode
l inference\, and integrating with other frameworks in the MLOps ecosystem
.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Building an ML Application Platform from the Ground Up - Sean Sheng
URL:https://global2022.pydata.org/cfp/talk/GQXZ93/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-QVCU3E@global2022.pydata.org
DTSTART:20221203T093000Z
DTEND:20221203T100000Z
DESCRIPTION:Model traceability and reproducibility are crucial steps when d
eploying machine learning models. Model traceability allows us to know whi
ch version of the model generated which prediction. Model reproducibility
ensures that we can roll back to the previous versions of the model anytim
e we want.\nWe\, as ML engineers\, designed reusable workflows which enabl
e data scientists to follow these two principles by design.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:ML Model Traceability and Reproducibility by Design - Basak Eskili\
, Maria Vechtomova
URL:https://global2022.pydata.org/cfp/talk/QVCU3E/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-KC893G@global2022.pydata.org
DTSTART:20221203T100000Z
DTEND:20221203T103000Z
DESCRIPTION:Automatic Speech recognition (ASR) is used in many devices to i
dentify Bilingual speech data. Bilingual language or in more scientific te
rms a code switched language is one or more languages being mixed in a spe
ech utterance. In this presentation\, learn about different deep learning
techniques that can be used for the classification of such speech utteranc
es. If you are a beginner in this field and don't know where to start\, jo
in me to explore this use case and learn something new!
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Implementation and analysis of deep learning models for codeswitche
d speech classification - Yashasvi Misra
URL:https://global2022.pydata.org/cfp/talk/KC893G/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-WFCX9M@global2022.pydata.org
DTSTART:20221203T110000Z
DTEND:20221203T113000Z
DESCRIPTION:Named entity recognition models might not be able to handle a w
ide variety of spans\, but Spancat certainly can! Within our open-source l
ibrary for NLP\, spaCy\, we've created a NER model to handle overlapping a
nd arbitrary text spans. Dive into named entity recognition\, its limitati
ons\, and how we've solved them with a solution-focused talk and practical
applications.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Is it possible to have entities within entities within entities? -
Victoria Slocum
URL:https://global2022.pydata.org/cfp/talk/WFCX9M/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-RNVDBT@global2022.pydata.org
DTSTART:20221203T110000Z
DTEND:20221203T130000Z
DESCRIPTION:Programmers\, regardless of their level of experience\, enjoy s
olving increasingly complex challenges within their domains of expertise\,
and one of the main reasons they can spend more time working on different
challenges is because of the workflows they put in place around their pro
jects. Data Engineers build pipelines to make sure the company's data is i
n optimal condition for Analysts to answer business critical questions\, f
or Data Scientists to automate the selection\, engineering\, and analysis
of distinct features before training models\, and for machine learning eng
ineers to know where to get data from\, or send it to\, for the APIs they
build. On the other hand\, developers automate the infrastructures of soft
ware products to reduce time to market of new features. These groups of da
ta professionals and engineers are not too foreign to each other as they a
ll speak the same language\, Python. That said\, the goal of this workshop
is to dive deep into different workflow patterns for building pipelines f
or data and machine learning projects. In other words\, this workshop brid
ges the gap between building one-off projects and building automated and r
eusable pipelines\, all while creating an environment that welcomes both\,
newcomers and experts to either the data and machine learning fields or t
he engineering one.
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Workflows Deep Dive: From Data Engineering to Machine Learning - Ra
mon Perez
URL:https://global2022.pydata.org/cfp/talk/RNVDBT/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-EN3CWF@global2022.pydata.org
DTSTART:20221203T113000Z
DTEND:20221203T120000Z
DESCRIPTION:What would the sunset painted by van Gogh look like? And the fr
ont of your house? This is entirely possible with Deep Learning. The Neura
l Style Transfer technique aims to compose images in the style of another
image\, modifying the content and saving it at the same time.\n\nIn this l
ecture\, the concepts of Deep Learning\, neural networks\, and the step-by
-step to carry-out styles transfer will be introduced.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Mixing art with Python: an introduction to Style Transfer - Isac Mo
ura Gomes
URL:https://global2022.pydata.org/cfp/talk/EN3CWF/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-ADH33C@global2022.pydata.org
DTSTART:20221203T120000Z
DTEND:20221203T123000Z
DESCRIPTION:Abstract\nThere are many stories about Data Science hires that
end up working in silos\, buried in ad hoc business requests. According to
Gartner\, only 20% of analytic insights will deliver business outcomes in
2022. And a large number of Machine Learning Models never go to productio
n. On top of that\, work satisfaction among data professionals is staggeri
ngly low\; for instance\, 97% of data engineers reported feeling burnt out
in a 2021 Wakefield Research Survey. Furthermore\, despite living in the
era of information\, many business executives are making decisions based o
n guesswork because of the need for more relevant data access in a timely
fashion. This talk covers why many data initiatives fail and\, more import
antly\, how to prevent it. I lay out a number of practical approaches base
d on work experience that will help you to unlock the potential of data an
d analytics — from how to build the case and gain buy-in to promoting
a fact-based decision-making culture. This talk is for you if you are a b
usiness leader sponsoring data initiatives\, if you work in data applicati
ons\, or if you would benefit from enhanced analytics.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:A Practical Approach To Unlock Value From Data and Analytics - Mari
a Feria
URL:https://global2022.pydata.org/cfp/talk/ADH33C/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-AWYSLG@global2022.pydata.org
DTSTART:20221203T120000Z
DTEND:20221203T133000Z
DESCRIPTION:Lightning Talks are short 5-10 minute sessions presented by
community members on a variety of interesting topics.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Lightning Talks - Aadit Kapoor\, Ted Conway\, Roshini Sudhaharan\,
Shrabastee Banerjee\, Shivay Lamba\, SARADINDU SENGUPTA\, Srivatsa Kundurt
hy\, Kefentse Mothusi\, Lutz Ostkamp\, Srikanth
URL:https://global2022.pydata.org/cfp/talk/AWYSLG/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-FQBSP8@global2022.pydata.org
DTSTART:20221203T123000Z
DTEND:20221203T130000Z
DESCRIPTION:We learn about the world from data\, drawing on a broad array o
f statistical and inferential tools. The problem is that causal reasoning
is needed to answer many of our questions\, but few data scientists have t
his in their skill set. This talk will give a high-level introduction to a
spects of causal reasoning and how it is complemented by Bayesian inferenc
e. A worked example will be given of how to answer what-if questions.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:What-if? Causal reasoning meets Bayesian Inference - Benjamin Vince
nt
URL:https://global2022.pydata.org/cfp/talk/FQBSP8/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-H3BMVC@global2022.pydata.org
DTSTART:20221203T130000Z
DTEND:20221203T143000Z
DESCRIPTION:The real world is a constant source of ever-changing and non-st
ationary data. That ultimately means that even the best ML models will eve
ntually go stale. Data distribution shifts\, in all of their forms\, are o
ne of the major post-production concerns for any ML/data practitioner. As
organizations are increasingly relying on ML to improve performance as int
ended outside of the lab\, the need for efficient debugging and troublesho
oting tools in the ML operations world also increases. That becomes especi
ally challenging when taking into consideration common requirements in the
production environment\, such as scalability\, privacy\, security\, and r
eal-time concerns.\n\nIn this talk\, Data Scientist Felipe Adachi will tal
k about different types of data distribution shifts and how these issues c
an affect your ML application. Furthermore\, the speaker will discuss the
challenges of enabling distribution shift detection in data in a lightweig
ht and scalable manner by calculating approximate statistics for drift mea
surements. Finally\, the speaker will walk through steps that data scienti
sts and ML engineers can take in order to surface data distribution shift
issues in a practical manner\, such as visually inspecting histograms\, ap
plying statistical tests and ensuring quality with data validation checks.
\n\nRequirements: Access to Google Colab Environment\n\nAdditional Materia
l: https://colab.research.google.com/drive/1xOcAq8NwPazmQFhXVEvzRxXw5LiFqv
fj?usp=sharing
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Visually Inspecting Data Profiles for Data Distribution Shifts - Fe
lipe de Pontes Adachi
URL:https://global2022.pydata.org/cfp/talk/H3BMVC/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-EKQXW9@global2022.pydata.org
DTSTART:20221203T133000Z
DTEND:20221203T140000Z
DESCRIPTION:It’s common to hear about demand forecasting in the e-commerc
e ecosystem. Indeed\, It plays a pivotal role in logistics and inventory a
pplications. However\, due to uncertainty impacting demand and the stochas
tic nature of most downstream applications\, the need for probabilistic de
mand forecasting emerges. Moreover\, for the most realistic use cases\, yo
u’ll have to forecast for thousands if not hundreds of thousands of time
series. The problem we will explore together is: how can we get probabili
stic forecasts that embrace uncertainty and scale?\n\nThe talk is light-he
arted\, contains few math formulas\, and is aimed at forecasting practitio
ners! If you are new to the topic of forecasting\, you'll be able to follo
w! We take the time to pose the problems and develop deeper from there.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Probabilistic demand forecasting at scale - Hagop Dippel
URL:https://global2022.pydata.org/cfp/talk/EKQXW9/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-93N8LR@global2022.pydata.org
DTSTART:20221203T140000Z
DTEND:20221203T143000Z
DESCRIPTION:In this talk we present Hamilton\, a novel open-source framewor
k for developing and maintaining scalable feature engineering dataflows. H
amilton was initially built to solve the problem of managing a codebase of
transforms on pandas dataframes\, enabling a data science team to scale t
he capabilities they offer with the complexity of their business. Since th
en\, it has grown into a general-purpose tool for writing and maintaining
dataflows in python. We introduce the framework\, discuss its motivations
and initial successes at Stitch Fix\, and share recent extensions that se
amlessly integrate it with distributed compute offerings\, such as Dask\,
Ray\, and Spark.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Scalable Feature Engineering with Hamilton - Elijah ben Izzy\, Stef
an Krawczyk
URL:https://global2022.pydata.org/cfp/talk/93N8LR/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-T8WU9K@global2022.pydata.org
DTSTART:20221203T143000Z
DTEND:20221203T150000Z
DESCRIPTION:Media Mix Modeling\, also called Marketing Mix Modeling (MMM)\,
is a technique that helps advertisers to quantify the impact of several m
arketing investments on sales.\n\nIf a company advertises in multiple medi
a (TV\, digital ads\, magazines\, etc.)\, how can we measure the effective
ness and make future budget allocation decisions? Traditionally\, regressi
on modeling has been used\, but obtaining actionable insights with that ap
proach has been challenging.\n\nRecently\, many researchers and data scien
tists have tackled this problem using Bayesian statistical approaches. For
example\, Google has published multiple papers about this topic.\n\nIn th
is talk\, I will show the key concepts of a Bayesian approach to MMM\, its
implementation using Python\, and practical tips.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Media Mix Modeling: How to Measure the Effectiveness of Advertising
in Python - Hajime Takeda
URL:https://global2022.pydata.org/cfp/talk/T8WU9K/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-3SDFWF@global2022.pydata.org
DTSTART:20221203T150000Z
DTEND:20221203T163000Z
DESCRIPTION:MLOps encapsulates the discipline of – and infrastructure tha
t supports – building and maintaining machine learning models in product
ion. This tutorials highlight four challenges in carrying this out effecti
vely: scalability\, data quality\, reproducibility\, recoverability\, and
auditability. As a data science and machine learning practitioner\, you’
ll learn how Flyte\, an open source data- and machine-learning-aware orche
stration tool\, is designed to overcome these challenges and you'll get yo
ur hands dirty using Flyte to build ML pipelines with increasing complexit
y and scale!
DTSTAMP:20240328T151843Z
LOCATION:Workshop/Tutorial I
SUMMARY:Production-grade Machine Learning with Flyte - Niels Bantilan
URL:https://global2022.pydata.org/cfp/talk/3SDFWF/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-M3KSXT@global2022.pydata.org
DTSTART:20221203T150000Z
DTEND:20221203T153000Z
DESCRIPTION:Throughout the COVID pandemic\, we’ve experienced extremes br
ought on by economic downturns and uncertainty across industries—to this
day\, we are feeling these effects around the globe. In fact\, statistics
show that many professionals have changed careers following the waves of
layoffs that have recently occurred—but how? How can we best prepare for
this type of situation\, and how easy or difficult is it to change career
s? If these questions have been on your mind\, join this session to learn
about several global industry trends\, ways to adapt to career changes\, a
nd how to grow your tech skills and leverage certain platforms to support
your learning process.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Navigating Career Adjustments in Times of Uncertainty - Jose Mesa
URL:https://global2022.pydata.org/cfp/talk/M3KSXT/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-9ANXPG@global2022.pydata.org
DTSTART:20221203T150000Z
DTEND:20221203T153000Z
DESCRIPTION:The energy sector has gained great attention in 2022 due to the
current global energy crisis. Understanding which technologies and techni
ques are suitable for this sector is crucial to guarantee an effective tra
nsition to a future with cleaner and efficient energy sources. This talk a
ims to educate tech professionals interested in the applications of machin
e learning in the energy sectors\, especially when it comes to time series
analysis and forecasting. The audience is expected to have a basic unders
tanding of data science and machine learning\, and will be introduced to t
he concepts of time series\, as well as the most common techniques utilize
d in the sector.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:A dive into time series for the energy sector - Rosana de Oliveira
Gomes
URL:https://global2022.pydata.org/cfp/talk/9ANXPG/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-7VWGNV@global2022.pydata.org
DTSTART:20221203T153000Z
DTEND:20221203T160000Z
DESCRIPTION:Inspirational sports speeches have motivated and reinvigorated
folks for years. Whether you’re a developer or an athlete\, they’ve wi
thstood the journey because even the smartest\, the bravest\, and the most
resilient need some encouragement on occasion. \n\nDuring our time togeth
er\, we’ll use Python and a speech-to-text provider to transcribe sports
podcasts that contain inspirational speeches. We’ll discover insights f
rom the transcripts to determine which ones might give you a boost of ener
gy or rally your team. \n\nWe’ll discover common topics of each sports p
odcast episode and measure how they leave us feeling: victorious or perhap
s overcoming the agony of defeat. We’ll also investigate if there are an
y similarities and differences in the sports speeches and what makes a gre
at motivational speech that moves people to action.\n\nBy the end\, you’
ll have a better understanding of using speech recognition in real-world s
cenarios and using features of Machine Learning with Python to derive insi
ghts.\n\nThis talk is for developers of all levels\, including beginners.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Discover Inspirational Insights in Motivational Sports Speeches Usi
ng Speech-to-Text - Tonya Sims
URL:https://global2022.pydata.org/cfp/talk/7VWGNV/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-GTV9MP@global2022.pydata.org
DTSTART:20221203T153000Z
DTEND:20221203T160000Z
DESCRIPTION:You need to quickly process a large amount of data—but runnin
g Python code is slow.\nLibraries like NumPy and Pandas bridge this perfor
mance gap using a technique called vectorization.\nIn order take full adva
ntage of these libraries to speed up your code\, it's helpful to understan
d what vectorization means and when and how it works.\n\nIn this talk you'
ll learn what vectorization means (there's 3 different definitions!)\, how
it speeds up your code\, and how to apply it to your code.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Speed up Python data processing with vectorization - Itamar Turner-
Trauring
URL:https://global2022.pydata.org/cfp/talk/GTV9MP/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-83CCLU@global2022.pydata.org
DTSTART:20221203T160000Z
DTEND:20221203T163000Z
DESCRIPTION:Bad data is likely the largest factor limiting your model's per
formance. We'll talk about common data errors and how you can fix them tod
ay using Galileo. Although the majority of examples used will be in CV and
NLP\, the same insights apply to other modalities!
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Critical CV/NLP Data Errors and How to Fix Them with Galileo - Niki
ta Demir
URL:https://global2022.pydata.org/cfp/talk/83CCLU/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-7W3ZE8@global2022.pydata.org
DTSTART:20221203T163000Z
DTEND:20221203T170000Z
DESCRIPTION:The electrochemical battery is one of the most important techno
logies for a renewable future. In this beginner-friendly talk\, we will wa
lk through how fundamental quantum mechanics and data science inform how w
e fine-tune battery materials for higher performance. We will also show ho
w we used these techniques to computationally model a lithium-oxygen batte
ry in Python.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Developing Battery Materials with Python - Gabriel Birnbaum
URL:https://global2022.pydata.org/cfp/talk/7W3ZE8/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-WMKJ8F@global2022.pydata.org
DTSTART:20221203T163000Z
DTEND:20221203T170000Z
DESCRIPTION:The International Monetary Fund (IMF) provides a huge variety o
f economic datasets from different countries. We have explored the Python
API for data extraction from the IMF\, which allows users (primarily econo
mists or financial analysts) to access the data. The structure of the unde
rlying JSON datasets is quite complex for an unprepared user. In the talk\
, we will demonstrate the API workflow and go over the issues that we are
designing a new\, easier-to-use API\, which is currently being developed.
This is joint work with Dr. Sou-Cheng Choi (Illinois Institute of Technolo
gy and SAS Institute Inc.).\nThe talk is primarily directed at data analys
ts and economists interested in utilizing IMF's macroeconomic data.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:IMF Data Discovery and Collection - Irina Klein
URL:https://global2022.pydata.org/cfp/talk/WMKJ8F/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-8UMQRV@global2022.pydata.org
DTSTART:20221203T170000Z
DTEND:20221203T173000Z
DESCRIPTION:There’s a growing interest from small and large companies ali
ke to move their data and their analytical pipelines into the Cloud as it
adds large cost and operational benefits to businesses. Despite this\, it
can be unclear and sometimes confusing to know how cloud services can be u
sed to replicate your existing analytical solutions in the Cloud or even h
ow services can fit together to build new solutions. \nThe goal of this ta
lk is to help answer these two questions. First by explaining what modern
analytics look like in cloud environments and then by presenting a live us
e case for building an end-to-end analytical solution in the context of fr
aud detection for E-commerce businesses. \n\nThis talk will assume knowled
ge in some areas\, such as the Hadoop ecosystem and the main tools used su
ch as Airflow\, Kafka\, Spark\, etc. an overall idea will be more than suf
ficient and some experience with building and deploying machine learning m
odels (some MLOps experience). Therefore\, the target audience would be d
ata scientists/engineers with 4-5 years of experience working in analytics
and/or architects looking to move their analytics solutions to the Cloud
but are still unsure how it can fit together. \n\nAt the end of the talk\,
the audience will have a clear understanding of how modern analytics can
be performed in the cloud and what a typical modern data architecture look
s like. In the context of AWS\, the audience will also have an understandi
ng of the AWS analytics service offerings and what services can be used fo
r/tailored to their needs. Finally\, the audience will gain a clearer idea
of how they can leverage ML capabilities to build a full pipeline in the
cloud while cutting their development time by half. \n\n\nThe proposed out
line for the talk will follow the description below:\n \nThe evolution of
analytics from the 90s to current day (2-3 mins)\nModern analytics in the
Cloud - what’s available (4-5 mins)\nHow analytics is done in the Cloud
- tools to help manage the cloud solutions (5 mins)\nCase study - Fraud De
tection for Ecommerce (2-3 mins)\nRefresher concepts (3 mins)\nBreaking do
wn the architecture (6-7 mins)\nScaling and improving the solution (5-6 mi
ns)
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Modern Analytics in the Cloud - A case for fraud detection - Marwa
Ahmed
URL:https://global2022.pydata.org/cfp/talk/8UMQRV/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-XGJUPX@global2022.pydata.org
DTSTART:20221203T170000Z
DTEND:20221203T173000Z
DESCRIPTION:Quarto is an open-source scientific and technical publishing sy
stem that builds on standard markdown with features essential for scientif
ic communication. The system has support for reproducible embedded computa
tions\, equations\, citations\, crossrefs\, figure panels\, callouts\, adv
anced layout\, and more. In this talk we'll explore the use of Quarto with
Python\, describing both integration with IPython/Jupyter and the Quarto
VS Code extension.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Reproducible Publications with Python and Quarto - Tom Mock
URL:https://global2022.pydata.org/cfp/talk/XGJUPX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-NZUYLM@global2022.pydata.org
DTSTART:20221203T173000Z
DTEND:20221203T180000Z
DESCRIPTION:Hugging Face Transformers is a popular open-source project with
cutting edge Machine Learning (ML)\, but meeting the computational requir
ements for advanced models it provides often requires scaling beyond a sin
gle machine. In this session\, we explore the integration between Hugging
Face and Ray AI Runtime (AIR)\, allowing users to scale their model traini
ng and data loading seamlessly. We will dive deep into the implementation
and API and explore how we can use Ray AIR to create an end-to-end Hugging
Face workflow\, from data ingest through fine-tuning and HPO to inference
and serving.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:HuggingFace + Ray AIR Integration: A Python developer’s guide to
scaling Transformers - Antoni Baum
URL:https://global2022.pydata.org/cfp/talk/NZUYLM/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-77DGQX@global2022.pydata.org
DTSTART:20221203T173000Z
DTEND:20221203T180000Z
DESCRIPTION:"Is a lion closer to being a giraffe or an elephant?"\nIt is no
t a question anyone asks.\nSo why address that classification problem the
same as you would classification of age groups or medical condition severi
ty?\n\nThe talk will walk you through a review of regression-based approac
hes for what may seem like classification problems. Unlock the true potent
ial of your labels!
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Classification Through Regression: Unlock the True Potential of You
r Labels - Topaz Gilad
URL:https://global2022.pydata.org/cfp/talk/77DGQX/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-KGX8RD@global2022.pydata.org
DTSTART:20221203T180000Z
DTEND:20221203T190000Z
DESCRIPTION:Pia is the co-founder and CEO of Open Collective.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Keynote - Pia Mancini - Pia Mancini
URL:https://global2022.pydata.org/cfp/talk/KGX8RD/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-MHY89K@global2022.pydata.org
DTSTART:20221203T190000Z
DTEND:20221203T193000Z
DESCRIPTION:Ok\, I lied\, I still write tests. But instead of the example-b
ased tests that we normally write\, have you heard of property-based testi
ng? By using Hypothesis\, instead of thinking about what data I should tes
t it for\, it will generate test data\, including boundary cases\, for you
.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:I hate writing tests\, that's why I use Hypothesis - Cheuk Ting Ho
URL:https://global2022.pydata.org/cfp/talk/MHY89K/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-DKQF3R@global2022.pydata.org
DTSTART:20221203T190000Z
DTEND:20221203T203000Z
DESCRIPTION:Lightning Talks are short 5-10 minute sessions presented by
community members on a variety of interesting topics.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track II
SUMMARY:Lightning Talks - Colleen M. Farrelly\, David Chapuis\, Dina Bavli
URL:https://global2022.pydata.org/cfp/talk/DKQF3R/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-UTM78E@global2022.pydata.org
DTSTART:20221203T193000Z
DTEND:20221203T200000Z
DESCRIPTION:How can we make smart decisions when optimizing a black-box fun
ction?\nExpensive black-box optimization refers to situations where we nee
d to maximize/minimize some input–output process\, but we cannot look in
side and see how the output is determined by the input.\nMaking the proble
m more challenging is the cost of evaluating the function in terms of mone
y\, time\, or other safety-critical conditions\, limiting the size of the
data set we can collect.\nBlack-box optimization can be found in many task
s such as hyperparameter tuning in machine learning\, product recommendati
on\, process optimization in physics\, or scientific and drug discovery.\n
\nBayesian optimization (BayesOpt) sets out to solve this black-box optimi
zation problem by combining probabilistic machine learning (ML) and decisi
on theory.\nThis technique gives us a way to intelligently design queries
to the function to be optimized while balancing between exploration (looki
ng at regions without observed data) and exploitation (zeroing in on good-
performance regions).\nWhile BayesOpt has proven effective at many real-wo
rld black-box optimization tasks\, many ML practitioners still shy away fr
om it\, believing that they need a highly technical background to understa
nd and use BayesOpt.\n\nThis talk aims to dispel that message and offers a
friendly introduction to BayesOpt\, including its fundamentals\, how to g
et it running in Python\, and common practices.\nData scientists and ML pr
actitioners who are interested in hyperparameter tuning\, A/B testing\, or
more generally experimentation and decision making will benefit from this
talk.\nWhile most background knowledge necessary to follow the talk will
be covered\, the audience should be familiar with common concepts in ML su
ch as training data\, predictive models\, multivariate normal distribution
s\, etc.
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Bayesian Optimization: Fundamentals\, Implementation\, and Practice
- Quan Nguyen
URL:https://global2022.pydata.org/cfp/talk/UTM78E/
END:VEVENT
BEGIN:VEVENT
UID:pretalx-cfp-YU8VEJ@global2022.pydata.org
DTSTART:20221203T200000Z
DTEND:20221203T203000Z
DESCRIPTION:Let’s scratch the twitter meta-data together and go below the
surface with tweepy. Want to find out if the tweets you follow are trying
to persuade you to do things? Have the feeling the advocates for some iss
ues use certain emotions to push you in certain directions? Now you can fi
nd out
DTSTAMP:20240328T151843Z
LOCATION:Talk Track I
SUMMARY:Deep Into the Tweet - Dina Bavli
URL:https://global2022.pydata.org/cfp/talk/YU8VEJ/
END:VEVENT
END:VCALENDAR