PyData Global 2022

sktime - python toolbox for time series: pipelines and transformers
12-01, 11:30–13:00 (UTC), Workshop/Tutorial I

sktime is a widely used scikit-learn compatible library for learning with time series. sktime is easily extensible by anyone, and interoperable with the pydata/numfocus stack. sktime has a rich framework for building pipelines across multiple learning tasks that it supports, including forecasting, time series classification, regression, clustering. This tutorial explains basic and advanced sktime pipeline constructs, and introduces in detail the time series transformer which is the main component in all types of pipelines. It is a continuation of the sktime introductory tutorial at pydata global 2021.


In time series analysis, often multiple, sometimes repetitive, algorithmic steps are applied to the data. Organising these steps in a clear way to enable flexible deployment on multiple data sets and easily reproduce results. Pipelines offer a solution to this challenge by providing a structure to build flexible sequences of applying time series algorithms. The modular building blocks of pipelines are "transformers" or "transformations" (in the scikit-learn sense) as well as estimators specific to learning tasks, such as forecasters or time series classifiers. The challenge in learning with time series are the many different types of transformations, such as:

  • transformers of a time series to time series, e.g., differencing and detrending
  • transformers of a time series to a row of primitive features/valus in a data frame, e.g., time series summary
  • transformers of a time series to a panel of time series, e.g., bootstrap, sliding window
  • transformers that apply to hierarchical time series, e.g., reconciliation or hierarchical aggregation
  • transformers of a pair of time series to a real number, e.g., time series distances or kernels

sktime provides a framework to distinguish the above, and to use transformers of the various types as components in different types of pipelines, such as:

  • forecasting pipelines, with transformers applied to endogeneous, exogeneous, or output data,
  • time series classification pipelines, with transformers applied to inputs,
  • compositor pipelines for time series distances or parameter estimators,
  • specialized reduction steps consuming different types of transformers and machine learning estimators,
  • and many more.

The design challenge is to formalize transformers in a way that a given type of transformer can be used in multiple types of pipeline, and creating pipelines that can use multipe types of transformers. sktime solves this challenge through the "scientific type" formalism which applies object orientation based typing to the transformers and inputs/outputs. The presentation will also briefly touch on advanced pipelining concepts such as graph pipelines and roadmap items inviting contributions.


Prior Knowledge Expected

No previous knowledge expected

Principal Data Scientist and Practice Lead at GfK.

Founder and core developer of the sktime python toolbox for machine learning with time series.

I completed a Master of Science degree in informatics in 2019 with the Karlsruhe Institute of Technology. I am working towards a PhD in Informatics at the Karlsruhe Institute of Technology. My research focuses on using deep generative models in energy systems and coping with concept drift in energy time series forecasting. Additionally, I investigate how general pipeline architecture has to be designed for time series analysis tasks

Currently pursuing a PhD student in Computational Biology. Mirae joined the sktime team in the summer of 2022 as part of Google Summer of Code and has since stayed on as a core developer!

Learn more at: https://miraeparker.com/