PyData Global 2022

Everything you need to know about Transformer Models
12-02, 21:30–22:00 (UTC), Talk Track I

Transformer models are all around in the deep learning community and this talk will help to better understand why transformers achieve such impressive results. Using various explainability techniques and plain numpy examples, participants will gain an understanding of the attention mechanism, its implementation, and how it all comes together.


Transformer-based models have revolutionized Natural Language Processing and are also increasingly applied in Computer Vision. For example, they achieve impressive results generating images or translating text. Many people probably remember the sometimes strange or funny translations of Google Translate and similar services, which nowadays are highly accurate – thanks to the transformer models working behind the scenes.

The key components of a transformer are so-called attention heads that are often claimed to mimic the way the human brain processes texts and images. Mathematically, attention is not much more than matrix multiplication, but what’s actually going on here? How can layers of these learn to achieve impressive results?

In this talk, we will take a practical approach to understand the inner workings of transformers by implementing a basic example in numpy and visualize how attention works.

This talk is aimed at a general audience. No in-depth knowledge of maths or machine learning is required to follow this talk. Aside from familiarity with the basics of numpy, all you need is your curiosity.


Prior Knowledge Expected

No previous knowledge expected

Mike is a Data Scientist specializing in NLP and Explainable AI. Currently he’s working on his Master thesis on generating user-centric explanations for bi-modal Transformers.