PyData Global 2022

I am Aadit Kapoor. I love to think of new innovative ideas and most importantly try hard to make them a reality. In addition to that, I am extremely curious.
I am interested in solving technically difficult, real world AI problems. I enjoy solving applied A.I problems that are value-driven and can fundamentally change the way humans live. Additionaly, I love reading and analyzing businesses and companies.
My research areas of interests include: Artificial Intelligence , Machine Learning (A.I/M.L in Healthcare), Data Science, Biomedical Informatics/NLP.
I believe Data Science allows me to express my curiosity in ways I'd never imagine. The coolest thing in Data Science is that I see data not as numbers but as an opportunity (business problem), insights(predictive modelling\stats and data wrangling), and improvement (metrics).

Lightning Talks

Ada Nduka Oyom

Ada is the Founder of She Code Africa (SCA), a non-profit organisation focused on empowering young girls and women in Africa through technical skills and Co-founder, Open Source Community Africa, one of the largest communities for open-source enthusiasts, advocates and experts across Africa. She’s currently engaged with Google as the Ecosystem community manager for Sub-saharan Africa.

Keynote - Ada Nduka Oyom

Adrian Kosowski

Adrian obtained his PhD in discrete algorithms at the age of 20. He specializes in network science and modeling processes which involve graphs, time, and all things random. During a decade in academia, he ran projects on transportation systems, route planning, and logistics across Europe. He likes to experiment with data with a Python data science stack whenever he can. A big fan of competitive programming - and an even bigger fan of 24-hour contests - Adrian co-founded spoj.com, which has been used by about a million people to boost their programming skills. He happens to be the author of some of the most bizarre problem storylines you will find there. For research audiences, he has talked on topics ranging from synchronization in distributed systems to path-finding algorithms, including two Best Paper talks at major ACM conferences.
As one of the co-creators of Pathway, Adrian has spent the last two years shaping its development directions, contributing code, and being obsessed about usability.

Reactive data processing in Python

Aidan Russell

Aidan Russell has been the Head of Data Science at Infogrid for a little over 2 years. During that time the company as a whole has grown from 12 to 310 and the Data Science team from 1 to 15 (and continuing to grow). He holds a PhD in physics and has worked in data science since 2015 across a range of companies from early stage startups to 100 year old corporates (and picked up some strong opinions on the best way to run things along the way).

Lightning Talks

Alejandro Herrera

Alejandro Herrera is a Solution Architect at Ponder.

Ponder provides enterprise-ready tools in Python for rapid, flexible experimentation with data at scale. Ponder makes data teams more productive by enabling them to get insights faster with tools they know and love.

Supercharging your pandas workflows with Modin

Alejandro Saucedo

Alejandro is the Chief Scientist at the Institute for Ethical AI & Machine Learning, where he contributes to policy and industry standards on the responsible design, development and operation of AI, including the fields of explainability, GPU acceleration, ML security and other key machine learning research areas. Alejandro Saucedo is also Director of Engineering at Seldon Technologies, where he leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and has a strong track record building cross-functional teams of software engineers. He is currently appointed as governing council Member-at-Large at the Association for Computing Machinery, and is currently the Chairperson of the GPU Acceleration Kompute Committee at the Linux Foundation.

LInkedin: https://linkedin.com/in/axsaucedo
Twitter: https://twitter.com/axsaucedo
Github: https://github.com/axsaucedo
Website: https://ethical.institute/

Metadata Systems for End-to-End Data & Machine Learning Platforms
Industrial Strength DALLE-E: Scaling Complex Large Text & Image Models

Aleksander Molak

R&D Machine Learning Engineer and Researcher at Ironscales and independent Machine Learning Researcher at TensorCell. Before joining Ironscales Alex has built end-to-end machine learning systems for Fortune Global 100 and 500 companies.
International speaker, blogger, currently working on a book on causality in Python. Interested in NLP, causality, probabilistic modeling, representation learning and graph neural networks.
Loves traveling with my wife, passionate about vegan food, languages and running.

Website: https://alxndr.io

LinkedIn: https://www.linkedin.com/in/aleksandermolak/

BERT's Achilles' heel? Applying contrastive learning to fight anisotropy in language models.

Alexander CS Hendorf

Alexander Hendorf runs his own company opotoc providing expertise e.g. for data and artificial intelligence e.g. at the digital excellence consultancy KÖNIGSWEG. Through his commitment as a speaker and chair of various international conferences, he is a proven expert in the field of data intelligence. He has many years of experience in the practical application, introduction and communication of data and AI-driven strategies and decision-making processes in the industry. He is a Python Software Foundation Fellow, EuroPython Fellow, likes to work in small dedicated teams and loves to work with and contribute to the Python and PyData community. Twitter LinkedIn

Too much data? When big data starts to become a bad idea

Allan Campopiano

As a data scientist at Deepnote, I have the privilege of partnering with developers all over the world in order to help them promote their tools to the broader scientific community. By demonstrating the leading data science tools in Deepnote, scientists and developers can easily onboard to new concepts and techniques.

My degree in cognitive and behavioural neuroscience helped me realize my dual passion for (1) developing scientific software and (2) communicating technical concepts in a straightforward manner. My main goal is to find creative ways to lower the barrier-to-entry for scientists who are learning new tools.

To this end, I've published two peer-reviewed statistical software libraries. The most notable is Hypothesize—a Python library for robust statistics based on Rand Wilcox's R package. I continue to deliver workshops on robust statistics, data visualization, and data science tooling in general.

Lightning Talks
Machine Learning in the Warehouse with Python

Allen Downey

Allen Downey is a Staff Scientist at DrivenData and Professor Emeritus at Olin College.
He is the author of several textbooks -- including Think Python, Think Bayes, and Elements of Data Science -- and "Probably Overthinking It", a blog about data science and Bayesian statistics. He received a Ph.D. in computer science from U.C. Berkeley and Bachelor's and Master's degrees from MIT.

Bayesian Decision Analysis

Anindya Saha

Machine Learning Platform Engineer at Lyft Inc.

Building a Machine Learning Platform with OSS in 90 min

Antoni Baum

Antoni Baum is a software engineer at Anyscale, working on Ray AIR, XGBoost-Ray, and other ML libraries. In his spare time, he contributes to various open source projects, trying to make machine learning more accessible and approachable.

HuggingFace + Ray AIR Integration: A Python developer’s guide to scaling Transformers

Archit Khosla

Archit Khosla is a founding engineer and currently a Director of Machine Learning at PathAI, a start-up that aims to transform the domain of pathology using machine learning and deep learning techniques. Archit has a master's degree in machine learning for the Georgia institute of technology.
His 5 years of work at PathAI involves using Machine Learning to drive faster, more accurate detection and diagnosis of diseases such as Cancer, Inflammatory Bowel Disease (IBD), and Non Alcoholic Steatohepatitis (NASH). Archit is passionate about improving machine learning techniques and transforming healthcare by using software engineering practices and algorithm optimization.

Lightning Talks

Barbara Rychalska

AI researcher focused on recommender systems, natural language processing, and interpretability of deep learning models. She works as AI Research Director at Synerise. She is a laureate of many international AI competitions, i. e. SemEval, SIGIR Rakuten Challenge, WSDM Booking.com Challenge, Twitter RecSys Challenge, alongsite teams from Google DeepMind, Baidu, IBM, Amazon, NVIDIA. She is active in many international research networks, working with teams at MI2Datalab at Warsaw University of Technology, Nanyang Technological University in Singapore, Oxford University, and Jagiellonian University.

On creating behavioral profiles of your customer from event stream data – introduction to Cleora, the open-source tool for real time multimodal modeling.

Basak Eskili

Basak Eskili is a Machine Learning Engineer at Ahold Delhaize. She is working on creating new tools and infrastructure that enable data scientists to quickly operationalise algorithms. She is bridging the space between data scientists and platform engineers while improving the way of working in accordance with MLOps principles.

In her previous role, she was responsible for bringing models to production. She focused on NLP projects and building data processing pipelines. Basak also implemented new solutions by using cloud services for existing applications and databases to improve time and efficiency.

ML Model Traceability and Reproducibility by Design

Benedikt Heidrich

I completed a Master of Science degree in informatics in 2019 with the Karlsruhe Institute of Technology. I am working towards a PhD in Informatics at the Karlsruhe Institute of Technology. My research focuses on using deep generative models in energy systems and coping with concept drift in energy time series forecasting. Additionally, I investigate how general pipeline architecture has to be designed for time series analysis tasks

sktime - python toolbox for time series: pipelines and transformers

Benjamin Vincent

I work in the Bayesian data analysis space. Much of my time is spent solving real business problems consulting as a Principal Data Scientist for PyMC Labs. Before, I held a permanent faculty position for 15 years using Bayesian data analysis methods to research human decision making.

What-if? Causal reasoning meets Bayesian Inference

Brian Skinn

Brian Skinn (@btskinn, @[email protected]) has always been a programmer (TI-82, represent!), but took a long arc through chemical engineering---B.S., Ph.D., and ten years in industrial electrochemical engineering R&D---before joining OpenTeams Incubator as a technology marketer in May 2022. Along the way, he learned a lot of VBA and a little MATLAB, Maple, and Java, before discovering Python in 2014 and never looking back. He maintains a couple of open source Python libraries, occasionally posts on his blog (https://bskinn.github.io), and reads SF/F & makes enthusiastically amateur music in his remaining spare time.

Lightning Talks

Cameron Devine PhD

Cameron Devine is a Visiting Professor of Mechanical Engineering at Saint Martin’s University. He received his PhD from the University of Washington. His research interests include control systems, robotics, and manufacturing. Currently, he lives in Olympia, WA, with his wife Tamara and their three cats.

Lightning Talks

Cameron Riddell

A top 0.5% answerer on StackOverflow, Cameron Riddell is an Associate Instructor and Tech Lead at Don’t Use This Code with over 8 years’ experience.

PyData Pub Quiz

Camille Koenders

Camille Koenders is a Software Developer at &effect. Together with Jan Dix, she programmed the dashboard presented in this talk. She holds a degree in Molecular Biotechnology from the University of Heidelberg and a degree in Computer Science from the Technical University of Berlin. Before working at &effect, she was able to gather several years of programming experience in different companies, such as SAP. She volunteers at CorrelAid, where she is one of the hosts of the data science podcast CorrelTalk.

Lessons Learned Building Our Own Dashboard Solution Using Open-Source Technologies

Caroline Arnold

Caroline is a research data scientist with the German Climate Computing Centre DKRZ.

Lightning Talks

Cheryl Roberts

Cheryl has worked in the data science and predictive modeling field for over ten years, and has a background in computer science, applied math, and actuarial science.

Parallelization of code in Python for beginners

Cheuk Ting Ho

Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, now Cheuk is the Developer Advocate for Anaconda.

Besides her work, Cheuk enjoys speaking at various conferences. Cheuk also organises events for developers. Cheuk has organised conferences including EuroPython (of which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.

Trying No GIL on Scientific Programming
Too much data? When big data starts to become a bad idea
I hate writing tests, that's why I use Hypothesis

Christian Hundt

Christian is a theoretical physicist by training and holds a PhD in computer science with a passion for group theory, differential geometry, and massively parallel computing. In his current role as manager for AI Developer Technology he leads a team of dynamic and gifted engineers optimizing computational primitives for Deep Learning as well as scaling out end-to-end pipelines for a broad variety of scientific domains.

Machine Learning Frameworks Interoperability

Colleen M. Farrelly

Colleen M. Farrelly is a senior data scientist/machine learning scientist whose expertise spans NLP, TDA/geometry-based methods, time series analytics, and social network analytics. Her book, The Shape of Data, is slated for release in 2023 (https://nostarch.com/shapeofdata). She's passionate about African data science and social good initiatives.

Lightning Talks

DJ Patil

Former U.S. Chief Data Scientist. DJ is a board member for Devoted Health and former CTO. He’s a Senior Fellow at the Harvard Belfer Center, an Advisor to Venrock Partners, a member of the DoD's Science Board. Most recently, he was Senior Staff and CTO for the Biden-Harris Transition. Dr. Patil was appointed by President Obama to be the first U.S. Chief Data Scientist where he establishment nearly 40 Chief Data Officer roles. He also was Chief Scientist, Chief Security Officer and Head of Analytics and Data Product Teams at the LinkedIn where he co-coined the term Data Scientist.

Keynote - DJ Patil

Damian Romero

Damian Romero is a Ph.D. candidate in Hispanic Linguistics, focusing on corpus linguistics, computational linguistics, and natural language processing, with experience in English, Spanish, and Portuguese projects for academia and industry. He studies politeness and impoliteness in human interactions through computational linguistic methods. His secondary research interests are data annotation guidelines, UX research, and language evolution. He is also a machine learning intern at Explosion and a contributor to the Digital Humanities project Digital Tolkien (https://digitaltolkien.com/).

Data annotation for humans: Creating and refining annotation guidelines from a UX perspective

Daniel Huynh

CEO of Mithril Security.
Our goal: democratization of privacy in AI 🧠
https://github.com/mithril-security

BastionAI: Towards an Easy-to-use Privacy-preserving Deep Learning Framework

David Aronchick

David leads Compute over Data at Protocol Labs, helping, deploying and organizing the community building the next generation of the Internet.
Previously, he led Open Source Machine Learning Strategy at Azure, product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and co-founded the Kubeflow project and the SAME project. He has also worked at Amazon, Chef and co-founded three startups.
When not spending too much time in service of electrons, he can be found on a mountain (on skis), traveling the world (via restaurants) or participating in kid activities, of which there are a lot more than he remembers than when he was that age.

Revolutionizing the Big Data Age With Compute over Data

David Barrett

After earning his Ph.D., David Barrett has been teaching and applying results from computer-science research to engineer solutions for large-scale software problems for the last twenty years. His expertise includes machine learning, software systems, networks, databases, programming languages and compiler-construction. He has presented talks at peer-reviewed academic conferences on data layouts and retrieval scheduling for multimedia, and dynamic memory allocation for programming languages. He has also presented on document databases, and software vulnerabilities at NoSQLNow and the Open Source Summit.

Text to Data: Make Your Code Malleable, Not Brittle

David Chapuis

Freelance Python developer / data analyst
Works in Python, JavaScript, French, English, Spanish and Portuguese
Baker and meditation instructor in a previous life

Lightning Talks

Dean Pleban

Always learning and a builder at heart. Dean has worked on quantum optics and communication, computer vision, software development, and design – taking a multi-disciplinary approach and applying it to build products for data scientists and machine learning engineers.

Dean is currently the CEO & Co-Founder of DagsHub, a platform for data scientists and machine learning engineers, combining popular open-source tools and formats, to version their data, models, experiments, and code.

Dean is also the host of the MLOps Podcast, where he speaks with industry experts about getting ML models to production.

ML in Production – What does “Production” even mean?

Dina Bavli

Dina Bavli is a Data Scientist with experience in NLP, Graph theory, NetworkX, churn prediction, and a growing interest in ASR (automated speech recognition).
Her Master's thesis deals with classifying and characterizing persuasion. She is a former teaching assistant for ML and an international public speaker.
She is a data science content writer for workshops, meetups, and online courses, and an official author of the Towards Data Science and Better Programming publications.
Dina is passionate about data, sharing knowledge, and contributing to society. Whenever she can't find a sufficient tutorial, she creates one.

Deep Into the Tweet
Lightning Talks

Dominik Jany

Dominik is a Data Scientist at the Global Legal Entity Identifier Foundation (GLEIF). His professional focus is on achieving the highest possible data quality in the Global LEI System. In this context, he contributes to establishing best practices based on the most recent developments in the world of data standardization and analytics. Before joining GLEIF, Dominik gained experience in business development and forensic technology solutions and graduated from the Johannes Gutenberg University of Mainz with a master’s degree in Computational Sciences.

Lessons Learned Building Our Own Dashboard Solution Using Open-Source Technologies

Dominika Basaj

AI engineer and scientist with practical experience in creating and implementing solutions based on artificial intelligence. On a daily basis, she works with companies that want to use AI in their products - she helps them diagnose their needs and translate them into technical requirements. She has experience in managing projects implementing AI in products as the AI Product Owner. Her research interests lie in the area of interpretability of neural networks. As part of her PhD research, she completed research internships at Nanyang Technical University in Singapore and at the University of California at Davis. Right now she is working as Applied Data Science Lead at Synerise.

On creating behavioral profiles of your customer from event stream data – introduction to Cleora, the open-source tool for real time multimodal modeling.

Doug Davis

I'm a software engineer at Anaconda, where I contribute to the open source PyData/Scientific Python ecosystem. I primarily work in the Dask community. I decided to be a software engineer after training to be a particle physicist.

Extending Awkward Array into the broader PyData Ecosystem

Douglas Squirrel

Coach and consultant to tech teams, helping make tech teams insanely profitable since 2001

Executives at PyData

Dr. Lalitha Krishnamoorthy

Dr. Lalitha Krishnamoorthy is the Co-Founder and CEO of OpenTeams Global. In her role at OpenTeams Global, she is responsible for driving the organization's innovation, expanding the partner and network ecosystem across industries and academia, defining the portfolio and investment strategy, and delivering transformative experiences. She leads a team of open source experts that work with clients and business partners on their transformation journey through Data and AI methodologies to achieve stellar business results.

Prior to her current role, Lalitha served as Director of IBM Digital Commerce, SaaS, and Data Platforms where she directed IBM's Digital Business Transformation strategy through data-driven, commerce-ready, subscription-first product offerings. During her 20 year career at IBM, Lalitha held roles of increasing responsibility and oversaw numerous portfolio's leading IBM’s transition from core databases to federated data to advanced analytical capabilities, and eventually data, cloud and artificial intelligence.

She serves on the champions board of the Texas Girls Collaborative Project, a University of Texas at Austin statewide network committed to motivating and supporting women and girls to pursue and thrive in careers in science, technology, engineering, and mathematics (STEM).

Lalitha holds a doctorate in Neuro-Symbolic Artificial Intelligence, holds several invention patents, is a regular speaker at industry conferences, is on the board of directors for many startups, and a strong advocate for diversity and inclusion in technology.

Today, Lalitha is most interested in making technology equitable for everyone, as this creates a new set of market leaders and competitive disruption.

Algorithms at Scale: Raising Awareness on Latent Inequities in Our Data

Dr. Sonal Kukreja

Dr Sonal Kukreja is a passionate academician and researcher. She is PhD in Computer Science and working as a Professor at Bennett University. She is also the founder of ChildrenWhoCode, which is on a mission to provide quality technical education to every child in India, to prepare them for the upcoming tech market.

Responsible AI - What, Why, How and Future!

Duarte Carmo

I'm a Technologist/hacker, born and raised in Portugal, now based in Copenhagen. My work lies in the intersection of Machine Learning, Data, Software Engineering, and People. I'm in love with Technology, and how it improves people's lives.

I help large scale companies and startups delivering value to users. I've worked with clients from all over the spectrum: from Public companies to YC startups and smaller. Currently, I run my own Machine Learning consulting shop.

MLOps for the rest of us: A poor man's guide to putting models in production

Eduardo Blancas

Eduardo Blancas is the Co-Founder and CEO of Ploomber, a Y Combinator-backed company developing tools to bridge the gap between interactive data work and production. Before that, he was a Data Scientist at Fidelity Investments, where he deployed the first customer-facing Machine Learning model for asset management. Eduardo holds an M.S. in Data Science from Columbia University and a B.S. in Mechatronics Engineering from Tecnológico de Monterrey.

You don't need a cluster for that: using embedded SQL engines for plotting massive datasets on a laptop
Teaching papermill new tricks: creating custom engines for flexible notebook execution

Elijah ben Izzy

Elijah has always enjoyed working at the intersection of math and engineering. More recently, he has focused his career on building tools to make data scientists more productive. At Two Sigma, he was building infrastructure to help quantitative researchers efficiently turn ideas into production trading models. At Stitch Fix he leads the Model Lifecycle team — a team that focuses on streamlining the experience for data scientists to create and ship machine learning models. In his spare time, he enjoys geeking out about fractals, poring over antique maps, and playing jazz piano.

Scalable Feature Engineering with Hamilton

Emeli Dral

Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to evaluate, test, and monitor the performance of machine learning models.

Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is a data science lecturer at GSOM SpBU and Harbour.Space University. She is a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students.

Why we do ML model retraining wrong, and how to do better

Emily Hopper

Emily is a Sr Data Scientist at JW Player, the largest video player on the open web, where she has worked since January 2021. Prior to this, Emily was a research seismologist, studying deep Earth structure using earthquake data.

Using feedback loops to tune predictive models in a video ad marketplace

Erik Welch

I work on open-source at NVIDIA, including Dask, GraphBLAS and python-graphblas, NetworkX, RAPIDS (cudf, cugraph), toolz, afar, etc.

100x Faster NetworkX: Dispatching to GraphBLAS

Evgeniya

Evgeniya is a Data Evangelist at Toloka: data labelling platform for machine learning pipelines, used world-wide by approximately 2,000 large and small businesses.
Her career path is made up of being an analyst-developer, an machine learning engineer, a solution architect and a business analyst, including 2 years experience of working with crowdsourcing. Evgeniya’s background is in Artificial Intelligence & Data Engineering, she’s currently doing her masters at Technical University of Munich.

Crowd-Kit: A Scikit-Learn for Crowdsourced Annotations

Eyal Kazin

Ex-cosmologist turned data scientist with over 15 years experience in solving challenging problems. I am motivated by intellectual challenges, highly detail oriented and love visualising data results to communicate insights for better decisions within organisations.

My main drive as a data scientist is applying scientific approaches that result in practical and clear solutions. To accomplish these, I use whatever works, be it statistical/causal inference, machine/deep learning or optimisation algorithms. Being result driven I have a passion for quantifying and communicating the impact of interventions to non-specialist audiences in an accessible manner.

My claim for fame is between 2004-2014 living in four different continents within a span of a decade, including three tennis Grand Slam cities (NYC, Melbourne, London).

Start asking your data “Why?” - A Gentle Introduction To Causal Inference
Lightning Talks
Don't Stop 'til You Get Enough - Hypothesis Testing Stop Criterion with “Precision Is The Goal”

Eyal Shnarch

I hold a PhD. in computer science, in the field of Natural Language Processing and Machine Learning. Joined I IBM Research for Project Debater as responsible for its NLP pipeline and tools. Later became a team lead working on Argument Mining, Argument Quality, Weak Supervision, and Debate Rebuttal. Toady I manage a group researching various language technologies.

Create text classifiers in a few hours using the open-source, no-code Label Sleuth system

Fabio Pliger

Fabio Pliger is the co-creator of PyScript and a Principal Software Architect at Anaconda, Inc, where he has been working for 8 years. Fabio is a fellow member of the PSF and the EuroPython Society, one of the Founders of the Python Italia association and former Chairman of the EuroPython Society, where he served from 2012 to 2016. As part of his work at both associations, he helped organize and co-chair several Pycon and Europython over the years.

He currently lives in beautiful Austin, Texas with his family, after having spent most of his life in Italy and Brazil.

PyScript and Data Science: a love story

Felipe de Pontes Adachi

Felipe is a Data Scientist in WhyLabs. He is a core contributor to whylogs, an open-source data logging library, and focuses on writing technical content and expanding the whylogs library in order to make AI more accessible, robust, and responsible. Previously, Felipe was an AI Researcher at WEG, where he researched and deployed Natural Language Processing approaches to extract knowledge from textual information about electric machinery. He is also a Master in Electronic Systems Engineering from UFSC (Universidade Federal de Santa Catarina), with research focused on developing and deploying fault detection strategies based on machine learning for unmanned underwater vehicles. Felipe has published a series of blog articles about MLOps, Monitoring, and Natural Language Processing in publications such as Towards Data Science, Analytics Vidhya, and Google Cloud Community.

Visually Inspecting Data Profiles for Data Distribution Shifts

Franz Kiraly

Principal Data Scientist and Practice Lead at GfK.

Founder and core developer of the sktime python toolbox for machine learning with time series.

sktime - python toolbox for time series: pipelines and transformers

Gabriel Birnbaum

I am a computational scientist fascinated with the physics of matter and high energy systems. I have previously researched computational methods to solve the equations of plasma physics – as well as ways to compute battery parameters from quantum mechanical codes.

Developing Battery Materials with Python

Gabriela de Queiroz

Gabriela de Queiroz is a Principal Cloud Advocate Manager at Microsoft. She leads and manages the Global AI/ML/Data team in Education Advocacy.

Before that, she worked at IBM as a Program Director on Open Source, Data & AI Technologies and then as Chief Data Scientist at IBM, leading AI Strategy and Innovations.

She is the founder of AI Inclusive, a global organization that is helping increase the representation and participation of gender minorities in Artificial Intelligence. She is also the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 200 chapters in 55+ countries.

She has worked in several startups and where she built teams, developed statistical models, and employed a variety of techniques to derive insights and drive data-centric decisions. She likes to mentor and share her knowledge through mentorship programs, tutorials, and talks.

Keynote - Gabriela de Queiroz

Gatha

Hello there! I am a blogger who writes about AI and privacy. Currently, wrapping up my Ph.D. in data science. I'm a dog mom and love painting watercolors.

Explaining Why You have a Favorite Cereal

Geir Arne Hjelle

Geir Arne teaches Python at Real Python. He has a background in mathematics and has worked with data analysis in different fields such as electricity markets, satellite geodesy, and computer vision. In his spare time, Geir Arne enjoys hammock camping, square roots, and aimless forest wandering.

Maps, Maps, Maps!

Gemma Turon

Trained as a molecular biologist, Gemma completed a PhD in colorectal cancer and stem cells at IRB Barcelona in 2019, before taking a one-year break to focus on working and volunteering in the third sector. This shifted her scientific interest to global health and neglected diseases, and the existing barriers to tackle some of the most urgent health issues in developing countries. With Ersilia, she aims to explore new ways of community building and engagement in the scientific arena, at the intersection between academia, biotech start-ups and NPOs.

Real-world Perspectives to Avoid the Worst Mistakes using Machine Learning in Science

Georgios Balikas

Georgios Balikas is a Lead Data Scientist at Salesforce Search. He works on building production models for machine learning applications such as named entity recognition, classification, ranking and question answering. He holds a PhD from the University of Grenoble Alps on the intersection of machine learning and natural language processing.

Converting sentence-transformers models to a single tensorflow graph

Goku Mohandas

🌏 Founder @MadeWithML
🍎 AI Research @Apple
⚕️ ML Lead @Ciitizen (acq.)
🎓 CS/ML @GeorgiaTech
🎓 Chem/Bio @JohnsHopkins

Real-world Perspectives to Avoid the Worst Mistakes using Machine Learning in Science

Hadley Wickham

Hadley is Chief Scientist at RStudio, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development
(e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website.

Embracing multi-lingual  data science

Hagop Dippel

Hagop Dippel is an Applied Scientist at Zalando, where he focuses on building demand forecasting and inventory optimisation applications. He particularly enjoys bringing research ideas to end-2-end production systems. He’s passionate about deep learning applied to real-world industry use cases and human centred AI. Hagop studied Data Science and Econometrics at the Aix-Marseille University in France. In his free time, you can find him cycling or running (just contact him if you want to jog and/or chat!)

Probabilistic demand forecasting at scale

Hajime Takeda

I started my career as a data analyst at a global consumer goods company. Currently, I am a leader in data analytics, web development, and digital marketing at a startup.

Media Mix Modeling: How to Measure the Effectiveness of Advertising in Python

Han Wang

Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.

Testing Big Data Applications (Spark, Dask, and Ray)

Haoyin Xu

PhD Student @ UCSD

Missing Data in the Age of Machine Learning

Haw-minn Lu

Dr. Haw-minn Lu is currently a Principal Data Scientist for Data Science/Machine Learning at West Health Institute in La Jolla, a nonprofit medical research organization. Dr. Lu earned his PhD in 1998 from the Electrical and Computer Engineering Department at the University of California, San Diego after receiving SM and SB degrees from the Massachusetts Institute of Technology.

Dr. Lu has been doing machine learning for over 25 years. His interests include machine learning, interactive visualization, data imputation/anonymization, and computing infrastructure.

Prior to joining West Health, Dr. Lu was involved in several startups using Python as the core infrastructure for applications such as ecommerce, network communications, and digital animation.

Missing Data in the Age of Machine Learning

Hugo Bowne-Anderson

Hugo Bowne-Anderson is Head of Developer Relations at Outerbounds, a company committed to building infrastructure that provides a solid foundation for machine learning applications of all shapes and sizes. He is also host of the industry podcast Vanishing Gradients. Hugo is a data scientist, educator, evangelist, content marketer, and data strategy consultant, with extensive experience at Coiled, a company that makes it simple for organizations to scale their data science seamlessly, and DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry. He has developed over 30 courses on the DataCamp platform, impacting over 2 million learners worldwide through his own courses. He also created the weekly data industry podcast DataFramed, which he hosted and produced for 2 years. He is committed to spreading data skills, access to data science tooling, and open source software, both for individuals and the enterprise.

Full-stack Machine Learning for Data Scientists

Ian Ozsvald

Ian is a Chief Data Scientist, has helped co-organise the annual PyDataLondon conference raising $100k+ annually for the open source movement along with the associated 11,000+ member monthly meetup.

Using data science he's helped clients find $2M in recoverable fraud, created the core IP which opened funding rounds for automated recruitment start-ups and diagnosed how major media companies can better supply recommendations to viewers. He gives conference talks internationally often as keynote speaker and is the author of the bestselling O'Reilly book High Performance Python (2nd edition). He has over 20 years of experience as a senior data science leader, trainer and team coach. For fun he's walked by his high-energy Springer Spaniel, surfs the Cornish coast and drinks fine coffee. Past talks and articles can be found at:

https://ianozsvald.com/
https://github.com/ianozsvald/
https://twitter.com/ianozsvald
https://www.linkedin.com/in/ianozsvald/

Data Science Project Patterns that Work
Executives at PyData

Irina Klein

MAS Data Science, Illinois Institute of Technology

IMF Data Discovery and Collection

Isaac Slavitt

Isaac is a co-founder and principal data scientist at DrivenData, Inc. He holds a master's in Computational Science and Engineering from Harvard’s School of Engineering and Applied Sciences and a BS in Operations Research from the U.S. Coast Guard Academy, and previously spent seven years as a Coast Guard officer serving in a variety of operational and quantitative roles.

The 10 commandments of reliable data science

Isabel Zimmerman

Isabel Zimmerman is a software engineer on the open source team at RStudio, where she works on building MLOps frameworks. When she's not geeking out over new data science techniques, she can be found hanging out with her dog or watching Marvel movies.

Practical MLOps for better models

Isac Moura Gomes

Brazilian, Software Engineer and AI enthusiast.
MSc of Computing at Federal University of Ceara.
Cinema lover, astrophysical curious and traveller.

Mixing art with Python: an introduction to Style Transfer

Itamar Turner-Trauring

Itamar Turner-Trauring is the creator of Sciagraph, a performance observability service for Python data pipelines, allowing you get performance and memory profiling for your production batch jobs. He is also author of the open source Fil memory profiler for Python. He writes about Python performance at https://pythonspeed.com.

Speed up Python data processing with vectorization

Jan Dix

Jan Dix is co-founder and Head of Software Development at &effect. His technical interest cuts across Software Engineering, Data Science, and Visualization. At &effect, he helps organizations in the public and social sector to make effective decisions. In his daily work, Jan Dix is overwhelmingly using and enjoying open-source technologies.

He holds a Master in Social and Economic Data Analysis (University of Konstanz) and a Master in Global Studies (Göteborgs Universitet). Since 2015, he has been volunteering at CorrelAid and has supported numerous non-profit organizations in implementing Data Science projects, given talks and workshops, and has been involved in the mentoring program.

Lessons Learned Building Our Own Dashboard Solution Using Open-Source Technologies

Jan Ittner

Computer scientist specialising in Artificial Intelligence, Machine Learning & Software Engineering. Co-founder of BCG GAMMA and member of GAMMA's leadership.

Exploring Feature Redundancy and Synergy with FACET 2.0 - and Why You Need It to Interpret ML Models Correctly

Jarek Potiuk

Independent Open-Source Contributor and Advisor, Committer and PMC member of Apache Airflow, Member of the Apache Software Foundation

Jarek is an Engineer with a broad experience in many subjects - Open-Source, Cloud, Mobile, Robotics, AI, Backend, Developer Experience, but he also had a lot of non-engineering experience - running a company, being CTO, organizing big, international community events, technical sales support, pr and marketing advisory but also looking at legal aspect of licensing and building open-source communities are all under his belt.

With the experience in very small and very big companies and everything in-between, Jarek found his place in Open-Source world, where his internal individual-contributor drive can be used to the uttermost of the potential.

Managing Python Dependencies at scale

Jay Chia

Jay is a cofounder of Eventual and a primary contributor to the Daft open-sourced project. Prior to Eventual, he was a software engineer building large scale ML data systems for computational biology at Freenome and self-driving cars at Lyft. He hails from the sunny island nation of Singapore, and used to command a platoon of tanks in the Singapore military.

Daft: the Distributed Python Dataframe for "Complex Data" (images, video, documents and more)

Jean-Martin Archer

JM (Jean-Martin) is a Staff Data Engineer at Shopify and part of the Data Foundations team which provides the primitives (compute, query, orchestration, etc.) leveraged by the data science and analytics teams. Before Shopify he developed data-intensive applications in the supply chain, financial, and energy software industry. He is an avid cyclist living in Victoria, BC, Canada with his wife, relatively young son, and two cats who believe they are dogs.

Apache Airflow at Scale: Let's Discuss
Running Apache Airflow at Scale

Jeff Hale

Jeff is passionate about helping people and organizations learn data skills and use data more effectively. As a Developer Advocate at Prefect, Jeff helps people coordinate their dataflows. Jeff has taught over 200 data science lessons and written widely on data-related topics. He has been designated a top Medium writer in the areas of Artificial Intelligence and Technology and authored several books. He co-organizes the Data Science DC Meetup and is a board member of Data Community DC.

Better Python Coding with Prefect Blocks

Jesper Dramsch

Jesper Dramsch works at the intersection of machine learning and physical, real-world data. Currently, they're working as a scientist for machine learning in numerical weather prediction at the coordinated organisation ECMWF.

Before, Jesper has worked on applied exploratory machine learning problems, e.g. satellites and Lidar imaging on trains, and defended a PhD in machine learning for geoscience. During the PhD, Jesper wrote multiple publications and often presented at workshops and conferences, eventually holding keynote presentations on the future of machine learning.

Moreover, they worked as consultant machine learning and Python educator in international companies and the UK government. Their courses on Skillshare have been watched over 30 days by over 5000 students. Additionally, they create educational notebooks on Kaggle, reaching rank 81 worldwide. Recently, Jesper was invited into the Youtube Partner programme.

Real-world Perspectives to Avoid the Worst Mistakes using Machine Learning in Science
Too much data? When big data starts to become a bad idea

Jim Kitchen

Jim Kitchen is a Sr. Software Engineer at Anaconda, focused on graph analytics and sparse data. He is a member of the GraphBLAS C API committee and is an author of the python-graphblas library.

100x Faster NetworkX: Dispatching to GraphBLAS

John Sandall

John Sandall is the CEO and Principal Data Scientist at Coefficient.

His experience in data science and software engineering spans multiple industries and applications, and his passion for the power of data extends far beyond his work for Coefficient’s clients. In April 2017 he created SixFifty in order to predict the UK General Election using open data and advanced modelling techniques. Previous experience includes Lead Data Scientist at YPlan, business analytics at Apple, genomics research at Imperial College London, building an ed-tech startup at Knodium, developing strategy & technological infrastructure for international non-profit startup STIR Education, and losing sleep to many hackathons along the way.

John is also a co-organiser of PyData London, co-founded Humble Data in 2019 to promote diversity in data science through a programme of free bootcamps, and in 2020 was a Committee Chair for the PyData Global Conference. He is currently a Fellow of Newspeak House with interests in open data, AI ethics and promoting diversity in tech.

Too much data? When big data starts to become a bad idea

Joris Van den Bossche

I am a core contributor to Pandas and Apache Arrow, and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas) at Ghent University.

On copies and views: updating pandas' internals (a.k.a. “Getting rid of the SettingWithCopyWarning”)

Jose Mesa

Jose Mesa is an Advance Software Engineer at Anaconda, supporting the development of Anaconda Nucleus. Before joining Anaconda, Jose was a Facilities Engineer in the Subsea, Civil, and Marine Engineering unit at Chevron. He received his Ph.D. from the University of Michigan Naval Architecture and Marine Engineering (NA&ME) department in 2018 and holds two MSE degrees in Aerospace Engineering (2016) and NA&ME (2015). He completed his dual Bachelor's in Civil Engineering and Land Surveying at the University of Puerto Rico in 2013. While completing his graduate studies Jose worked at Boeing, supporting component and full-scale aircraft fatigue testing.

Navigating Career Adjustments in Times of Uncertainty

Joseph Lucas

Joe loves the intersection of data science and security. He is currently an Offensive Security Researcher for Artificial Intelligence at NVIDIA and was previously a TPM on the AWS Red Team. He's a fan of the PyData ecosystem and has spoken at PyCon US and made small contributions to several open source libraries. He maintains a repository of machine learning security puzzles (HackThisAI) and recently worked with the AI Village to run a Capture-the-Flag competition at DEFCON30.

Mischief Managed: What hackers can do on your Jupyter instance

Josh Seltzer

Full-stack / ML developer by day, primatologist by night. I am interested in the intersection of technology and cognition, and how it will transform our minds, languages, and world(s).

Lightning Talks

Jovan Veljanoski

Jovan is a senior data scientist at Tiqets, where he creates predictive models and recommender systems centered around the e-commerce domain. Working mostly with Python in the Jupyter/PyData ecosystem, he has considerable experience in creating dashboard, clustering analysis and predictive modeling. Jovan has a PhD in Astrophysics, is a co-founder of vaex.io, and is interested in novel machine learning technologies and applications.

Vaex: the perfect DataFrame Library for Python data apps

Juan Luis Cano Rodríguez

Juan Luis (he/him/él) is an Aerospace Engineer with a passion for STEM, programming, outreach, and sustainability. He works as Data Scientist Advocate at Orchest, where he empowers data scientists by building an open-source, scalable, easy-to-use workflow orchestrator. He has worked as Developer Advocate at Read the Docs, as software engineer in the space, consulting, and banking industries, and as a Python trainer for several private and public entities.

Apart from being a long-time user and contributor to many projects in the scientific Python stack (NumPy, SciPy, Astropy) he has published several open-source packages, the most important one being poliastro, an open-source Python library for Orbital Mechanics used in academia and industry.

Finally, Juan Luis is the founder and former chair of the Python España association, the point of contact for the Spanish Python community, former organizer of PyCon Spain, which attracted 800 attendees in its last in-person edition in 2022, and current organizer of the PyData Madrid monthly meetups.

Expressive and fast dataframes in Python with polars

Kacper Łukawski

Kacper Łukawski is a Developer Advocate at Qdrant - an open-source neural search engine. His broad experience is mostly related to data engineering, machine learning, and software design. He has been actively contributing to the discussion on Artificial Intelligence by conducting lectures and workshops locally and internationally.

Lightning Talks

Kalyan Prasad

presented talks at prestigious conferences and Educational Institutions such as PyData Global, JupyterCon, PyCon India, PyCon APAC, PyCon Hong Kong, PyCon JP, PyCon ZA, Pyjamas, Developer Conference Telangana 2021, BelPy & KLS Gogte Institute of Technology, Belagavi, Karnataka, India. Worked as Reviewer and Mentor for reputed conferences & hackathons including, PyCon US, JupyterCon, PyData Global, PyCon India, PyConf Hyderabad & Manthan 2021. Kalyan has also contributed to various tech communities. He enjoys being involved with these communities and helping them grow. Currently I am associated with the following organizations below:
PyCon India – Review Panel Work Group Lead
PyConf Hyderabad – Organizing Committee Member
PyData Global Impact Mentoring Program - Mentor
Hyderabad Python Users Group – Core Member/ Meetups Organizer
Humans for AI – Program Manager for AI learning Community
Here are some of my previous & upcoming conference talks links –
https://geekle.us/schedule/datascience
https://pydata.org/global2021/schedule/
https://hopin.com/events/pyconindia2020#schedule
https://cfp.jupytercon.com/2020/schedule/general-sessions/
https://www.pycon.se/
https://belpy.in/schedule.html
https://www.linkedin.com/feed/update/urn:li:activity:6838155804479242240/ (Invited talk)
https://pycon.hk/2021/2021-schedule/
https://2021.pycon.jp/time-table
https://th.pycon.org/

ARCH/GARCH Models Tour

Katrina Riehl

Katrina is the Head of the Streamlit Data Team at Snowflake. She is joining Georgetown University as adjunct faculty this Spring. She also volunteers as the President of the Board of Directors at NumFOCUS, a non-profit supporting the PyData open source ecosystem. For almost two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data mining, and visualization. Most notably, she has helped lead data science efforts at the University of Texas Austin Applied Research Laboratory, Apple, HomeAway (now, Vrbo), and Cloudflare. Katrina received MS and PhD degrees in Computer Science from the University of Texas at Dallas.

Too much data? When big data starts to become a bad idea

Kaushik Bokka

Kaushik Bokka is a Senior Research Engineer at Lightning AI and one of the core maintainers of the PyTorch Lightning library. He has prior experience in building production scale Machine Learning and Computer Vision systems for several products ranging from Video Analytics to Fashion AI workflows. He has also been a contributor to a few other open-source projects and aims to empower the way people and organizations build AI applications.

Supercharge your training on TPUs

Kefentse Mothusi

Data enthusiast with a touch of development

Lightning Talks

Kilian Kluge

My journey into Python started in a physics research lab, where I discovered the merits of loose coupling and adherence to standards the hard way. I like automated testing, concise documentation, and hunting complex bugs.

I recently completed my PhD on the design of human-AI interactions and now work to use Explainable AI to open up new areas of application for AI systems.

Do You Follow What I’m Explaining? A Practitioner’s Guide to Opening the AI Black Box for Humans

Kurt Schelfthout

Kurt has worked in a variety of application domains, from logistics to most recently finance. He's currently working on Meadowrun (https://meadowrun.io), to make cloud compute more easily accessible for data scientists and data engineers.

Kurt writes software engineering and computing science deep dives at Get Code (https://getcode.substack.com), and hot takes on Twitter as @kurt2001 (https://twitter.com/kurt2001) or Mastodon as @kurts (https://mastodon.online/@kurts).

Lightning Talks

Lara Kattan

Lara is a data scientist (but who isn't these days), curriculum developer and instructor. She's a data science manager at Ernst & Young (EY) and an adjunct at the University of Chicago's Booth School of Business. When she's not writing Python code, she's probably taking care of foster kittens or falling on the ice (thanks to a recent but so far not-very-successful attempt to learn to figure skate).

Simulations in Python: Discrete Event Simulation with SimPy

Lauren Oldja

Lauren is a Principal Data Scientist at the social good software company Bonterra, working primarily on political and not-profit fundraising business lines.

Lauren is also a frequent NumFOCUS volunteer. She is the Inaugural Chair of the NumFOCUS Champions Circle, a volunteer-led committee focused on generating leads and informing fundraising strategy for NumFOCUS. This is also Lauren's time as Executive Conference Chair for PyData NYC, having also Chaired the previous two events in 2018 and 2019

Executives at PyData

Laysa Uchoa

Laysa is a Developer Advocate at Aiven, a company that offers a fully managed, OSS cloud data platform. She is the Head of Fullstack Events Association in Munich and the Leader of PyLadies Munich. She is also an OSS contributor and organizer of OCPP (Open Charge Point Protocol) communities.

Super Search with OpenSearch and Python

Liron Faybish

Liron is a Data Scientist at PayPal for more than 5 years. She works for the cyber security threat oversight team in the information security organization. Her focus is finding machine learning solutions for information security problems in general and specifically for insider frauds threats.

Liron holds a Msc in software and information systems engineering. In her thesis she researched the field of user verification on mobile devices using sequences of touch gestures. The full paper of this thesis was published at PAKDD 2018 conference, and also she published an extended abstract of this research in UMAP 2017 conference.

Detecting anomalous sequences using text processing methods

Lu Qiu

Lu Qiu is a machine learning engineer at Alluxio and is a PMC maintainer of the open source project Alluxio. Lu develops big data solutions for AI/ML training. Before that, Lu was responsible for core Alluxio components including leader election, journal management, and metrics management. Lu receives an M.S. degree from George Washington University in Data Science.

How to Eliminate the I/O Bottleneck and Continuously Feed the GPU While Training in the Cloud

Lucas Wood

Luke Wood is a Machine Learning Specialist and a Software Engineering Generalist. Currently, Luke's focuses are in making KerasCV a powerful and expressive library to solve common Computer Vision tasks and publishing high quality research in top Machine Learning conferences. Luke currently works full time at Google on the Keras team, and is pursuing his Doctorate in Machine Learning at UC San Diego under Peter Gerstoft.

Object Detection with KerasCV

Lutz Ostkamp

Data Engineer at Flixbus

Lightning Talks

Maarten Breddels

Maarten Breddels is an entrepreneur and freelance developer/consultant/data scientist working mostly with Python, C++ and Javascript in the Jupyter ecosystem. Creator of Solara, ipyvolume and vaex, founder of Vaex.io and Solara. His expertise ranges from fast numerical computation, API design, to 3d visualization. He has a Bachelor in ICT, a Master and PhD in Astronomy, likes to code and solve problems.

Vaex: the perfect DataFrame Library for Python data apps

Malte Tichy

After pursuing his PhD and postdoc research in theoretical quantum physics, Malte joined Blue Yonder as a Data Scientist in 2015. Since then, he has led numerous external and internal projects, which all involved programming python, creating, working with and evaluating probabilistic predictions, and communicating the achieved results.

Knowing what you don’t know matters: Uncertainty-aware model rating

Maria Feria

Maria Feria has over 15 years of experience working in the data and analytics space. Her journey in the field began in the intersection of Environmental Sciences and Geographic Information Systems where she created empirical data models and developed distributed geospatial databases.
She specialises in Data Engineering and has worked on leading data initiatives in a number of large corporations as well as startups. From productionisation of Machine Learning Models on a global scale to Big Data Migration, Data Governance and Management.

A Practical Approach To Unlock Value From Data and Analytics

Maria Vechtomova

Maria is a Senior Machine Learning Engineer at Ahold Delhaize. Maria is bridging the gap between data scientists infra and IT teams at different brands and focuses on standardization of machine learning operations across all the brands within Ahold Delhaize.
During nine years in Data&Analytics, Maria tried herself in different roles, from data scientist to a machine learning engineer, was part of teams in various domains, and have built broad knowledge. Maria believes that a model only starts living when it is in production. For this reason, last seven years, her focus was on the automation and standardization of processes related to machine learning.

ML Model Traceability and Reproducibility by Design

Martha L Escobar-Molano

Martha's professional experience in research and development spans decades of work in both academia and industry. After earning her PhD, she worked on research in databases, multimedia systems, machine learning and natural language processing. Her research work has been published in peer-reviewed journals and academic conference proceedings. Her development experience spans multiple industries, including: high performance parallel systems, cybersecurity, fintech, and healthcare.

Text to Data: Make Your Code Malleable, Not Brittle

Martin Durant

Staff Software Engineer

Single node shared memory comes to dask

Martin Walter

https://www.linkedin.com/in/aiwalter/

sktime - python toolbox for time series: pipelines and transformers

Marwa Ahmed

Hello, I'm 3X AWS certified data analytics/ML specialist focusing on building end-to-end analytical solutions in the AWS Cloud for small and medium sized businesses. I primarily work with clients to help them build out data architectures that are scalable, reliable and efficient then help them explore and build additional analytics capabilities and data-driven solutions they should have to make better business decisions or better serve their customers.
I have worked with clients across multiple industries such as e-commerce, digital marketing, politics and NGOs.
Outside of work, I play sports 2 times per day and I am a professional diver who loves travelling around the world chasing sharks and dolphins while learning a word or two in different languages.

Modern Analytics in the Cloud - A case for fraud detection

Marysia Winkels

Marysia is a Data Scientist and Data Science Educator at GoDataDriven. In addition to this, she is also chair of the PyData Amsterdam committee.

Data-Centric AI Cookbook: let's prep that data
Data Storytelling through Visualization

Mateusz Sokół

I'm an AI Software Engineer working at BCG Gamma, passionate about ML, data science, functional programming and software excellence. In my free time I enjoy swimming and hiking.

Exploring Feature Redundancy and Synergy with FACET 2.0 - and Why You Need It to Interpret ML Models Correctly

Matt Harrison

Matt has a CS degree from Stanford University. He is a best-selling author on Python and Data subjects. His books: Effective Pandas, Illustrated Guide to Learning Python 3, Intermediate Python, Learning the Pandas Library, and Effective PyCharm have all been best-selling books on Amazon. He just published Machine Learning Pocket Reference and Pandas Cookbook (Second Edition). He has taught courses at large companies (Netflix, NASA, Verizon, Adobe, HP, Exxon, and more), Universities (Stanford, University of Utah, BYU), as well as small companies. He has been using Python since 2000 and has taught thousands through live training both online and in person.

Testing Pandas: Shoots, leaves, and garbage!

Matthew Rocklin

Matthew is a long time open source software developer in the Python data ecosystem. He’s worked on several libraries, but is primarily known for his work on Dask, a library for parallel computing in Python. Matthew started working on Dask at Anaconda, then moved to NVIDIA, and then finally built his own company around Dask named Coiled.

Deploying Dask

Meriem Bendris

Meriem is a senior Deep Learning data scientist at NVIDIA, supporting partners delivering AI/deep learning solutions. Meriem area of expertise is large scale Natural Language Processing and conversational AI. Meriem holds a Ph.D. in signal and image from Telecom ParisTech, where she studied machine learning applied to audio-visual content.

Building Large-scale, Localized Language Models: From Data Preparation to Training and Deployment to Production.

Merve Noyan

Merve Noyan is a data scientist and a natural language processing researcher. She is working as developer advocacy engineer at Hugging Face and she's a Google Developer Expert in machine learning.

Improving production workflows for scikit-learn models with skops

Michael Petro

Michael is a Data Engineer at Shopify. Currently Michael works on the team focused on providing intuitive access to the computing and orchestration building blocks for applications and platforms that transform and query data across Shopify.

Apache Airflow at Scale: Let's Discuss
Running Apache Airflow at Scale

Michiel De Smet

Michiel is a long-time Software & Data Engineer specialized in implementing Data Platforms.

At Starburst Data Michiel is hyper-focussed on improving the Starburst Python and Lakehouse ecosystem.

In his spare time Michiel is also the maintainer of the popular vscode dbt extension, dbt Power User.

Building Data Products in a Lakehouse using Trino, dbt, and Dagster

Miguel Martínez

Miguel Martínez is a senior deep learning data scientist at NVIDIA, concentrating on Recommender Systems, NLP and Data Analytics. Previously, he mentored students at Udacity's Artificial Intelligence Nanodegree. He has a strong background in financial services, mainly focused on payments and channels. As a constant and steadfast learner, Miguel is always up for new challenges.

Building Large-scale, Localized Language Models: From Data Preparation to Training and Deployment to Production.
Machine Learning Frameworks Interoperability

Mike Rothenhäusler

Mike is a Data Scientist specializing in NLP and Explainable AI. Currently he’s working on his Master thesis on generating user-centric explanations for bi-modal Transformers.

Everything you need to know about Transformer Models

Mike Walmsley

Postdoc using ML and citizen science to answer astrophysics questions. Lead data scientist for Galaxy Zoo.

Real-world Perspectives to Avoid the Worst Mistakes using Machine Learning in Science

Mirae L Parker

Currently pursuing a PhD student in Computational Biology. Mirae joined the sktime team in the summer of 2022 as part of Google Summer of Code and has since stayed on as a core developer!

Learn more at: https://miraeparker.com/

sktime - python toolbox for time series: pipelines and transformers

Morgane Mahaud

Data scientist at Spark hq, an Irish consulting company specialized in data projects. I started my career with a phd in material science simulations and have spend a few years as data engineer, including two for AWS. Since working with Spark hq, I had the opportunity to be lead data scientist on several projects. I guide people underwater in my free time

Steering a data science project

Morgane Mahaud

Data scientist at Spark, an Irish consulting company. Since I joined this company, I had the occasion to lead the data science part of a few projects. I have a PhD in materials science (polymer networks simulation) and some years as data engineer, including at AWS. I guide people underwater in my free time.

Steering a data science project

Moritz Meister

Moritz Meister is a Software Engineer at Hopsworks, leading the development of the Hopsworks Feature Store. He has a background in Econometrics and holds MSc degrees in Computer Science from Politecnico di Milano and Universidad Politecnica de Madrid. He has previously worked as a Data Scientist on projects for Deutsche Telekom and Deutsche Lufthansa in Germany, helping them to productionize machine learning models to improve customer relationship management.

Data Validation for Feature pipelines: Using Great Expectations and Hopsworks

Mridul Seth

I am currently working on the NetworkX open source project (work funded through a grant from Chan Zuckerberg Initiative!) Also collaborating with folks from the Scientific Python project (Berkeley Institute of Data Science), Anaconda Inc and GESIS, Germany. Before this I used to work on the GESIS notebooks and gesis.mybinder.org.
I am also interested in the development and maintenance of the open source data & science software ecosystem. I try to help around with the broader Scientific Open Source ecosystem wherever possible. To share my love of Python and Network Science, I have presented workshops at multiple conferences like PyCon US, SciPy US, PyData London and many more!

100x Faster NetworkX: Dispatching to GraphBLAS

Myles Mitchell

Myles holds a PhD in Astrophysics and works as a Data Scientist at Jumping Rivers. With nine years of experience in Python programming, he enjoys applying his knowledge to a wide variety of projects ranging from astronomy to sport science. He is also deeply passionate about sharing his expertise with others, and has taught courses spanning data visualisation and machine learning with Python.

Data visualisation with Seaborn

Narendra Mukherjee

I am a Research Scientist at Philips Research in the Netherlands and have wide interests in Bayesian statistics, probabilistic modelling and open source development (and how bicycles are central to sustainable cities!). Please visit my website to learn more about me or to get in touch: https://narendramukherjee.github.io/

Interpretable and realistic generative models in data science? Likelihood-free Bayes’ says yes!

Niels Bantilan

Niels is a machine learning engineer and core maintainer of Flyte, creator of UnionML, creator of Pandera, a data testing tool for dataframe-like objects.

He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology.

His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems. He enjoys developing open source tools to make data science and machine learning practitioners more productive.

Production-grade Machine Learning with Flyte

Nikita Demir

Founding ML Engineer at Galileo. Previously MS/BS in CS at Stanford.

Critical CV/NLP Data Errors and How to Fix Them with Galileo

Niranjan G S

Niranjan G S is a Manager at AI Labs Subex . His work at Subex revolves around the areas of Fraud detection in the telecommunication domain and democratizing AI by helping build HyperSense - No code platform for creating and deploying ML workflows. He was previously a Data scientist at Amazon where he built AI solutions for detecting fraud in ecommerce domain.

Generate Actionable Counterfactuals using Multi-objective Particle Swarm Optimization

Oriol Abril Pla

My name is Oriol Abril Pla. I have a background in engineering physics and astrophysics but I currently work as computational statistician. I am a core contributor and council member of ArviZ and PyMC projects. I have also worked on statistical research while at Helsinki University, especially in the fields of inference diagnostics, prior elicitation and data visualization.

I dedicate a lot of my time to community management and documentation because I believe they are as important as the code. I have helped organized and mentored in multiple Data Umbrella sprints. I have also mentored many new ArviZ and PyMC team members whose backgrounds ranged from computational scientist to technical writer.

Working session for the Bayesian Python Ecosystem

Paco Nathan

Managing Partner at Derwen, Inc. Known as a "player/coach", with core expertise in graph technologies, natural language, data science, cloud computing; ~40 years tech industry experience, ranging from Bell Labs to early-stage start-ups. Board member for Recognai; Advisor for Amplify Partners, Data Spartan, KUNGFU.AI. Lead committer on PyTextRank, kglab. Formerly: Director, Community Evangelism for Apache Spark at Databricks.

Data Prep for Graphs

Parisa Gregg

Parisa is a Data Scientist at Jumping Rivers. She enjoys using Python to visualise and extract information from data. As a trainer, she loves sharing her knowledge, and has experience delivering courses on a variety of topics, from visualisation to machine learning. Her enthusiasm for Python and data science was developed during her PhD in Particle Physics with the CDT for Data Intensive Science at Durham University.

Data visualisation with Seaborn

Peter Vidos

Peter is the CEO & Co-Founder of Vizzu.

His primary focus is understanding how Vizzu's innovative approach to data visualization can be put to good use. Listening to people complaining about their current hurdles with building charts and presenting them is his main obsession, next to figuring out how to help data professionals utilize the power of animation in dataviz.

Peter has been involved with digital product development for over 15 years. Earlier products/projects he worked on cover mobile app testing, online analytics, data visualization, decision support, e-learning, educational administration & social. Still, building a selfie teleport just for fun is what he likes to boast about when asked about previous experiences.

ipyvizzu-story - a new, open-source charting tool to build, create and share animated data stories with Python in Jupyter

Pia Mancini

Pia is the co-founder and CEO of Open Collective. Pia is a Democracy activist, open source sustainer, co-founder & CEO at Open Collective, a platform that enables communities around the world to raise and spend funds in full transparency. Last year, collectives raised USD 37M, effectively unlocking access to impact funds around the world. She is also co-founder and President of The Open Source Collective, a non profit that provides a financial and admin home for +3000 open source projects around the globe granting them access to project directed funding. Pia is also co-founder and Chair of Democracy Earth Foundation, a Y Combinator backed non profit dedicated to developing technology for democracy around the world.

Keynote - Pia Mancini

Przemysław Denkiewicz

Software Engineer from Starburst. Passionate about Data Engineering, Big Data and all things data related.

Building Data Products in a Lakehouse using Trino, dbt, and Dagster

Quan Nguyen

Quan Nguyen is a Python programmer and machine learning enthusiast. He is interested in solving decision-making problems that involve uncertainty. Quan has authored several books on Python programming and scientific computing. He is currently pursuing a Ph.D. degree in computer science at Washington University in St. Louis where he does research on Bayesian methods in machine learning.

Bayesian Optimization: Fundamentals, Implementation, and Practice

Quincy Larson

Founder of freecodecamp.org. Quincy Larson is a teacher and school director from Oklahoma. At age 31, he started learning to code using free online courses and attending hackathons. After working as a software engineer, he founded freeCodeCamp.org to help other busy adults also learn to code and transition into tech careers. More than a million people now use freeCodeCamp courses each day, and 10,000s of people have used it to successfully transition into software development careers.

Keynote - Quincy Larson

Quizmaster James Powell

James Powell has hosted PyData pub quizzes since the first conference. Come see what he has prepared this year.

PyData Pub Quiz

Rahul Baboota

I am a huge admired of Machine/Deep Learning and an Applied Scientist at Microsoft. I have been fascinated with the world of data and machine learning since I first learned about it and have been extremely fortunate enough to have worked alongside great people and projects including working at the top NeuroImaging lab in US at my alma matter USC as well as working on developing Drug Discovery Deep Learning models at NVIDIA.

Let's Discover Drugs using Deep Learning

Ramon Perez

Hello! I'm Ramon, a data scientist, researcher, and educator living in Sydney. I currently work as a Senior Product Developer at Decoded, where I create custom data science tools, workshops, and training programs for clients in industries ranging from retail to finance. My previous roles have been at the intersection of education, data science, and research in the areas of entrepreneurship and strategy, alongside a few research ventures in consumer behavior and development economics in industry and academia, respectively. During my professional career, I've had the fortune of working with research teams dedicated to helping multinational companies understand their customers better via data-driven approaches ranging from A/B testing to machine learning. I also enjoy giving workshops and have had the honor of participating in PyCon (US, APAC, and Chile), SciPy (US and Japan), and countless Meetup events. In my spare time, I enjoy cycling, playing baseball, and exploring many of the outdoor wonders Australia has to offer.

Workflows Deep Dive: From Data Engineering to Machine Learning

Ray Bell

Ray Bell is currently a Principal Data Scientist and Data Science Manager at DTN. Ray's role involves implementing advanced data analysis for business strategy for Weather, Energy and Agriculture.

Ray has experience in implementing novel solutions to big problems in sectors such as Hospitality, Defense as well as Oil and Gas. Ray has developed data mining techniques on High Performance Computers and machine learning algorithms to optimize forecasts.

Ray is a Project Management Professional, holds a Ph.D. in climate science from the University of Reading, England and is a software carpentry instructor. Ray became a level 5 chartered manager with the Chartered Management Institute in 2015 and was a charted member of the Institute of Marine Engineering, Science & Technology while working at BP.

Lightning Talks

Rehan Durrani

Rehan Durrani is one of the core developers of Modin and a founding engineer at Ponder. Rehan studied Electrical Engineering and Computer Science at UC Berkeley and has contributed to leading open source research projects including Modin and Ray, as well as lead open source research projects like Clipper. His work on Modin has been published in leading publications, including VLDB.

How to maximally parallelize the entire pandas API

Richard Lee

Lightning Talks

Rosana de Oliveira Gomes

Rosana is a Senior Data Scientist in the energy sector. She transitioned to industry after a long career on academic research in Astrophysics. In her free time, she volunteers as a machine learning engineer and project manager in AI for good initiatives, working both with startups and NGOs in projects related to sustainability, among other topics. Rosana is also an advocate of inclusion in the tech, mentoring women and minorities into tech careers. She has founded the AI Wonder Girls team, which is an award winning all-female team of data scientists who get engaged in hackathons and other projects about social impact.

A dive into time series for the energy sector

Roshini Sudhaharan

Research Master student in Marketing at Tilburg University | RA at Tilburg Science Hub

Lightning Talks

Rosio Reyes

Rosio Reyes is a Software Engineer at Anaconda working as a part of the OSS Jupyter team. She is a Jupyter Notebook Steering Council Member and contributes to the Notebook and NbClassic projects.

An Evolving Jupyter Notebook

SARADINDU SENGUPTA

I am an ML Engineer at Nunam where I build learning systems to keep li-ion batteries in EVs safe, sustainable and performant.

Things I learned running neural networks on microcontrollers
Lightning Talks

SHASHANK SHEKHAR

Shashank is Data Sciences leader with diverse experience across verticals including Telecom, CPG, Retail, Hitech and E-commerce domains. He is currently heading the Artificial Intelligence Labs at Subex. In the past, he has worked in VMware, Amazon, Flipkart and Target and has been involved in solving various complex business problems using Machine Learning and Deep Learning. He has been part of the program committee of several international conferences like ICDM and MLDM and was selected as a mentor in Global Datathon 2018 organized by Data Sciences Society. He has multiple patents and publications in the field of artificial intelligence, machine learning, deep learning and image recognition in several international journals of repute to his credit. He has spoken at many summits and conferences like PyData Global, APAC Data Innovation Summit, Big Data Lake Summit, PlugIn etc. He has also published three open-source libraries on Python and is an active contributor to the global Python community.

Measurement of Trust in AI
Generate Actionable Counterfactuals using Multi-objective Particle Swarm Optimization

Sammy Sidhu

Sammy Sidhu is co-founder and CEO of Eventual. Sammy's background is in High Performance Computing (HPC) and Deep Learning and has over a dozen patents/publications in the space. In the past, he has worked on high frequency trading on wall street, medical AI research at Berkeley and self-driving cars at both DeepScale (acquired by Tesla) and Lyft Level 5 (acquired by Toyota). Native to the Bay Area, Sammy graduated from UC Berkeley with a degree in Electrical Engineering and Computer Science

Daft: the Distributed Python Dataframe for "Complex Data" (images, video, documents and more)

Sandy Ryza

Sandy works at Elementl as the lead engineer for the Dagster project. Prior, he led machine learning and data science teams at KeepTruckin and Clover Health. He's a committer on Spark and Hadoop, and co-authored O'Reilly's Advanced Analytics with Spark.

Data pipelines != workflows: orchestrating data with Dagster

Sanket Verma

Sanket is a data scientist based out of New Delhi, India. He likes to build data science tools and products and has worked with startups, government and organisations. He loves building community and bringing everyone together and is Chair of PyData Delhi and PyData Global. Currently, he's taking care of the community and OSS at Zarr as their Community Manager.
When he’s not working, he likes to play the violin and computer games and sometimes thinks of saving the world!

The Beauty of Zarr

Sarah Kaiser (She/Her)

Sarah has spent much of her career developing cutting-edge technologies in the lab with the best lab partner, Python. She can start plasma fires with lasers, image things with diamonds, and detect single photons in space, all controlled remotely by Jupyter Notebooks 💖. She loves communicating what is so exciting about tech by building new open source tools, communities, and writing books for all audiences. When not at her split ergo keyboard, she loves boating in the Seattle area, laser cutting everything, and playing with her dog Chewie.

Level up you Jupyter Notebooks with VS Code

Sean Sheng

Sean Sheng is the Head of Engineering at BentoML supporting product design, development and roadmap. Prior to joining, he led engineering teams in the Service Infrastructure org at LinkedIn responsible for building the platform that powers all backend distributed services at LinkedIn.

Building an ML Application Platform from the Ground Up

Shir Chorev

Shir is the co-founder and CTO of Deepchecks, an MLOps startup for continuous validation of ML models and data. Previously, Shir worked at the Prime Minister’s Office and at Unit 8200, conducting and leading research in various Machine Learning and Cyber related problems. Shir has a B.Sc. in Physics from the Hebrew University, which she obtained as part of the Talpiot excellence program, and an M.Sc. in Electrical Engineering from Tel Aviv University. Shir was selected as a featured honoree in the Forbes Europe 30 under 30 class of 2021

How to Properly Test ML Models & Data

Shivay Lamba

Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow. He has also interned at organizations like EY, Genpact.
He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.

Lightning Talks

Shrabastee Banerjee

I'm an Assistant Professor of Marketing at the Tilburg School of Economics and Management. I am broadly interested in online marketplaces and e-commerce. Particularly, I aim to look at how consumers make use of various cues in an e-commerce setting, and how these cues might have an impact on decision making. Examples include user-generated content such as reviews/ratings, non-focal prices advertised by a platform on their product page, and recommender systems. The primary methodologies I use are causal inference, experiments/quasi experiments and applied machine learning. In a separate stream of projects, I am also interested in applications of digitization for equity and development.

I received my PhD in Marketing from Boston University, where I was also a Rafik Hariri Graduate Fellow. Prior to that, I did my B.Sc (Calcutta University) and M.Sc (Warwick University, as a Commonwealth scholar) in Economics.

Lightning Talks

Sidra Effendi

Sidra Effendi is a Master's student at the University of Michigan, Ann Arbor in the School of Information. Her areas of interest are Information Retrieval, NLP, and data visualization. She has prior experience working as a Software Engineer and had her own startup which gave her exposure into UX and market research. At the University of Michigan, she developed a search engine funded by Microsoft and an NLP based digital curation assistant for UN ReliefWeb .

Urdu poems to Shakespearean English - Machine Translation

Srikanth

Welcome! I am Srikanth Komala Sheshachala

    Interests:
    Interpretable machine learning
    Causal inference
    Setting up, running and infering from AB tests and multi-armed bandits
    Optimization (Operations Research)
    Graph theoretic approaches to problems in data science
    Spatio-temporal analysis
    Multivariate statistics
    ...

Day Job: Staff Data Scientist at Walmart Gobal Tech, India.

Lightning Talks

Srivatsa Kundurthy

Srivatsa Kundurthy is a student based in the Greater New York City Area. As a Python practitioner, his projects include Open Source Intelligence tools for extracting public data and Python notebooks for explaining and simulating chaotic dynamical systems. His work in machine learning includes studying computer vision applications and researching neural networks for predicting states of chaotic dynamical systems. Additionally, he is working with the LAION Research Group and has co-authored LAION-5B, the world's largest open-source image-text dataset and the source dataset for Stable Diffusion. Apart from Machine Learning Research, Srivatsa is greatly interested in technology policy and community-related issues, particularly those extending to the accessibility of programming education. On the side, Srivatsa enjoys science communication and stargazing.

Lightning Talks

Stefan Krawczyk

A hands-on leader and Silicon Valley veteran, Stefan has spent the last 15 years working on data and machine learning systems at companies like Stitch Fix, Nextdoor and LinkedIn.

Most recently, Stefan led the Model Lifecycle team at Stitch Fix. Its mission was to streamline the model productionization process for over 100+ data scientists and machine learning engineers. The infrastructure they built created and tracked tens of thousands of models, and provided automated deployment that adheres to MLOps best practices.

A regular conference speaker, Stefan has guest lectured at Stanford’s Machine Learning Systems Design course and is an author of a popular open source framework called Hamilton.

Scalable Feature Engineering with Hamilton

Suliman Sharif

I am this odd blend of an organic chemist, computer scientist, and philanthropist. Through education and research, I have been part of many chemistry labs from natural products to inorganic metal frameworks to medicinal chemistry. As a result of my research in chemistry, and the majority of my undergraduate friends being software engineers I picked up computer science and took the classes. My time in industry has been through a couple of high-throughput research hospitals, building startup life science tech companies, and real estate.

In my most recent startup at L7 Informatics, I learned how to be the role of a leader, scientist and a software engineer learning how to integrate experiment workflows into our application and also develop software adequate to meet the needs of the customers. I moved into being a junior DevOps engineer to understand large scale infrastructure management.

I left the start-up scene to revisit chemistry and computer science in the context of Force Fields under the MacKerell Group at the University of Maryland School of Pharmacy PhD program. On my daily grind, I use a blend of different fields to expand chemical space and explore new avenues of technology including quantum mechanics, data visualization, and cloud infrastructure.

The Pythonic Common Chemical Universe

Takuya Ueshin

Takuya Ueshin is a software engineer at Databricks, and an Apache Spark committer and a PMC member. His main interests are in Spark SQL internals as well as PySpark. He is one of the major contributors of pandas API on Spark, a.k.a. the Koalas project.

Scale Data Science by Pandas API on Spark

Ted Conway

Ted Conway is currently a Data Analyst working with Big Data in the financial sector. Ted studied Computer Science at the University of Illinois (UIUC, BS LAS) and DePaul University (MS CS).

Lightning Talks

Thomas Dohmke

CEO of Github. Fascinated by software development since his childhood in Germany, Dr. Thomas Dohmke has built a career building tools developers love and accelerating innovations that are changing software development. Currently, Thomas is Chief Executive Officer of GitHub, where he drives the company’s core mission of making GitHub the home for all developers. Before his time at GitHub, Thomas previously co-founded HockeyApp and led the company as CEO through its acquisition by Microsoft in 2014, and holds a PhD in mechanical engineering from University of Glasgow, UK.

Keynote - Thomas Dohmke

Tiffany Chu

Tiffany is a data scientist at Caltrans, working on the Integrated Travel Project team. Prior to Caltrans, she did transportation and data-related work at the City of Los Angeles, Cambridge Systematics, and LA Metro.

The Dask at Hand: Using Dask to Speed up the High Quality Transit Areas dataset for the CA Open Data Portal.

Tom Mock

Thomas is the Customer Enablement Lead at Posit (formerly known as RStudio), helping Posit’s users be as successful as possible. He is deeply involved in the global data science community, sharing tips on Twitter and Mastodon (find him at @thomas_mock or fosstodon.org/@thomas_mock), as co-founder of TidyTuesday, a weekly Data Science learning challenge, and presenting on various Data Science topics on YouTube or at conferences.

Reproducible Publications with Python and Quarto

Tonya Sims

Tonya is a former Professional Basketball player turned Python enthusiast. She is currently a Python Developer Advocate for Deepgram, a speech-to-text company that has revolutionized the market. Her path to Python is unconventional. Her career started in athletics and then transitioned to pharmaceutical sales. She finally landed in her destination spot, the tech industry. Driven by her passion for teaching, she takes pride in helping others and loves connecting with her fellow Pythonistas! Outside of coding, Tonya enjoys all things sports. She is also an avid reader who loves writing and spending time with her nieces and nephews.

Discover Inspirational Insights in Motivational Sports Speeches Using Speech-to-Text

Topaz Gilad

Topaz Gilad is an R&D manager specializing in AI, machine learning, and computer vision, leading production-oriented innovative research.
With experience in large companies as well as startups, in various industries, from space imaging and semiconductor microscopy to sports tech, wellness, beauty, and self-care industry, she has developed methodologies to scale up while improving quality, delivery, and teamwork.

Currently VP of AI and Algorithms at Voyage81, an innovation company that excels in computer vision deep learning algorithms in both RGB and hyper-spectral domains. Previously head of AI at Pixellot, a leading AI-automated sports production company.

Topaz is also an advocate for women in tech. When she is not building algorithmic teams, she enjoys painting.

Bon Voyage! Leading machine learning research journeys with happy (into-production) endings
Classification Through Regression: Unlock the True Potential of Your Labels

Valerio Maggio

Valerio Maggio is a Researcher and Data scientist, currently working in Anaconda, inc. as Senior Developer Advocate. Valerio is also member of the Software Sustainability Institute, with a fellowship focused on Privacy Preserving methods for Data Science and Machine Learning. Valerio is an active member of the Python community: over the years he has led the organisation of many international conferences like PyCon/PyData Italy/EuroPython, and EuroSciPy. In his free time, Valerio is a casual "Magic: The Gathering" player of the Premodern format, enjoying playing with friends all over the world, and contributing to the community.

Real-world Perspectives to Avoid the Worst Mistakes using Machine Learning in Science

Victoria Slocum

Victoria recently graduated from UC San Diego with a degree in linguistics and a passion for natural language processing. She got involved with coding and Python after creating several applicational NLP projects like a playlist recommender based on a user-inputted quote.

She is just starting her career as a Developer Advocate for Explosion, the makers of spaCy! In this role, she takes care of the NLP-focused community around spaCy through example projects, videos, visuals, and posts.

She is in love with learning more about NLP and ensures that the open-source community has everything they need to do the same. Besides running marathons, making fun projects, and challenging her understanding of the world, she devotes all of her passion and motivation to educating the community around Explosion.

Is it possible to have entities within entities within entities?

Xinrong Meng

Xinrong Meng is a software engineer at Databricks and Apache Spark committer, focusing on PySpark. She is one of the major contributors of Pandas API on Spark.

Scale Data Science by Pandas API on Spark

Yannis Katsis

Yannis Katsis is a Senior Research Scientist at IBM Research, Almaden with expertise in the management, integration, and extraction of knowledge from structured, semi-structured, and unstructured data. In his recent work, Yannis focuses on lowering the barrier of entry to knowledge extraction by designing, analyzing, and building human-in-the-loop systems that enable domain experts to interactively generate knowledge extraction AI models that serve their needs. Yannis received his PhD in Computer Science from UC San Diego. His work has appeared in top conferences and journals in the areas of data management, natural language processing, and human-computer interaction, and has been leveraged for multiple IBM products as well as open source software.

Create text classifiers in a few hours using the open-source, no-code Label Sleuth system

Yashasvi Misra

I am Yashasvi Misra, recent computer science engineering graduate currently working as a Associate Data Scientist - 1 at ABInBev India. Enthusiastic about exploring & implementing new tech stack with good background in working on research projects, being a recipient of Excellence award from Samsung Research India. Extremely passionate about engaging in open source communities and contributing to diversity & inclusion.

Implementation and analysis of deep learning models for codeswitched speech classification

Zander

Zander is the CEO and Founder of Bytewax - a Python stream processing framework. His previous experience prior to Bytewax had been as a data scientist at GitHub and Heroku. He lives in Santa Cruz, California and when not at his computer likes to get outdoors.

Anomaly Detection on Streaming Data in Python using Bytewax and River

Ziheng Wang

Tony got his BS and MS from a cold dark place called MIT, where he spent years learning the archaic language Verilog. After graduating, he briefly did a startup writing assembly speeding up machine learning model inference. But after discovering most of his prospective customers spend more time loading inputs from a database than actually running the models, he decided to pursue a PhD at Stanford on big data processing.

Lightning Talks

Zornitsa Manolova

Zornitsa Manolova leads the Data Quality Management and Data Science team at the Global Legal Entity Identifier Foundation (GLEIF). Since April 2018, she is responsible for enhancing and improving the established data quality and data governance framework by introducing innovative data analytics approaches. Previously, Zornitsa managed forensic data analytics projects on international financial investigations at PwC Forensics. She holds a German Diploma in Computer Sciences with a focus on Machine Learning from the Philipps University in Marburg.

Lessons Learned Building Our Own Dashboard Solution Using Open-Source Technologies

samuel oranyeli

Data Engineer, love open source, contributor to pydatatable and pyjanitor.

Inequality Joins in Pandas with Pyjanitor