PyData Global 2022

On creating behavioral profiles of your customer from event stream data – introduction to Cleora, the open-source tool for real time multimodal modeling.
12-02, 16:00–16:30 (UTC), Talk Track II

We want to present Cleora – an open-source tool for creating compact representation of the behavior of your client. Cleora uses graph theory to transform streams of event data into embedding. It is suitable as an input for training models like churn, propensity and recommender systems. This is a talk useful for anyone who wishes to learn how to work with event data of clients and how to model client's behavior.


The objective of this talk is to introduce the audience to the open-source tool named Cleora (https://github.com/Synerise/cleora), which enables processing and embedding of big event data streams like client purchases, clicks on the webpage or card transactions, to name a few.

In many situations, data scientists struggle with creating a good representation of clients for two reasons:

  • information comes from many different sources that cannot be easily combined together eg. static attributes of the client and her clicks and purchases in the app

  • it is very resource consuming to process such big data streams and effective work is cumbersome

We observed that in order to tackle this issue, data scientists usually transform per client events into aggregated tabular form. This way they effectively lose a lot of latent information.

We ourselves, as a team of data scientists, struggled with this challenge and this is when Cleora was invented. We proved that it is a good way of representing behavior of the single client by being on a podium of a couple of competitions like SIGIR and KDD Cup, using exactly this solution.

Cleora embedding is effectively a compact profile of your client. It serves as an input to the neural network model – we can think of Cleora as an embedding model for events. During the talk, we will show how to use these embeddings for predicting churn among customers, modeling propensity for certain products and building recommender systems in a real time manner.

  • This talk will be interesting for all machine learning engineer who are working with data streams and model behavior of users.

  • It will be a hands-on talk with practical examples, however we will present the underlying technology, so you can expect a very gentle introduction to graphs.

  • After the talk you will be able to create your own compact user embeddings and use it as an input to machine learning models.

  • Prior experience with Python and modeling machine learning models is expected.


Prior Knowledge Expected

Previous knowledge expected

AI engineer and scientist with practical experience in creating and implementing solutions based on artificial intelligence. On a daily basis, she works with companies that want to use AI in their products - she helps them diagnose their needs and translate them into technical requirements. She has experience in managing projects implementing AI in products as the AI Product Owner. Her research interests lie in the area of interpretability of neural networks. As part of her PhD research, she completed research internships at Nanyang Technical University in Singapore and at the University of California at Davis. Right now she is working as Applied Data Science Lead at Synerise.

AI researcher focused on recommender systems, natural language processing, and interpretability of deep learning models. She works as AI Research Director at Synerise. She is a laureate of many international AI competitions, i. e. SemEval, SIGIR Rakuten Challenge, WSDM Booking.com Challenge, Twitter RecSys Challenge, alongsite teams from Google DeepMind, Baidu, IBM, Amazon, NVIDIA. She is active in many international research networks, working with teams at MI2Datalab at Warsaw University of Technology, Nanyang Technological University in Singapore, Oxford University, and Jagiellonian University.