PyData Global 2022

Zander

Zander is the CEO and Founder of Bytewax - a Python stream processing framework. His previous experience prior to Bytewax had been as a data scientist at GitHub and Heroku. He lives in Santa Cruz, California and when not at his computer likes to get outdoors.

The speaker's profile picture

Sessions

12-01
15:00
90min
Anomaly Detection on Streaming Data in Python using Bytewax and River
Zander

Bytewax is an open source, Python native, framework and distributed processing engine for processing data streams that makes it easy to build everything from pipelines for anonymizing data to more sophisticated systems for fraud detection, personalization, and more. For this tutorial, we will cover how you can use Bytewax and the Python library, River, to build an online machine learning system that will detect anomalies in IoT data from streaming systems like Kafka and Redpanda. This tutorial is for data scientists, data engineers, and machine learning engineers interested in machine learning and streaming data. At the end of the tutorial session you will know how to:
- run a streaming platform like Kafka or Redpanda in a docker container,
- develop a Bytewax dataflow
- run a River anomaly detection algorithm to detect anomalous data

The tutorial material will be available via a GitHub Repo and the content will be covered in roughly the timeline shown below.

  • 0-10min - Introduction to stream processing and online machine learning
  • 10-30min - Setup streaming system and prepare the data
  • 30-60min - Write the Bytewax dataflow and anomaly detector code
  • 60-90min - Tune the anomaly detector and run the Bytewax dataflow successfully.
Workshop/Tutorial II