PyData Global 2022

Create text classifiers in a few hours using the open-source, no-code Label Sleuth system
12-01, 16:00–16:30 (UTC), Talk Track II

Domain experts often need to create text classification models; however, they may lack ML or coding expertise to do so. In this talk, we show how domain experts can create text classifiers without writing a single line of code through the open-source, no-code Label Sleuth system (www.label-sleuth.org); a system that combines an intuitive labeling UI with active learning techniques and integrated model training functionality. Finally, we describe how the system can also benefit more technical users, such as data scientists, and developers, who can customize it for more advanced usage.


What is the problem and who is the target audience?

  • Domain experts that want to create text classifiers but lack ML knowledge.
  • More technical users, incl. researchers, data analysts, and developers who want to accelerate development of text classifiers.

A quick overview of current alternatives will be given.

What is Label Sleuth?

An open-source, interactive, no-code system for text annotation and text classifier creation. A brief tour and demo of the system will be given, showcasing the UI and interaction flow from the viewpoint of the end user.

Underlying technologies & features

A brief explanation of Sleuth’s internals will be given, which include:
- Intuitive text annotation UI (built using React).
- Active learning techniques to focus the domain expert’s labeling effort on the most valuable examples.
- Integrated model training functionality, which iteratively trains in the background improving versions of the model without user intervention (internally leveraging PyTorch, scikit-learn, HuggingFace transformers, spaCy, and NumPy).
- Pluggable architecture, allowing experimentation with different active learning algorithms and ML models.

Why Label Sleuth?

Success stories of using Label Sleuth will be provided, explaining how the system helped reduce the respective model creation effort and facilitated model creation for new audiences.


Prior Knowledge Expected

No previous knowledge expected

Yannis Katsis is a Senior Research Scientist at IBM Research, Almaden with expertise in the management, integration, and extraction of knowledge from structured, semi-structured, and unstructured data. In his recent work, Yannis focuses on lowering the barrier of entry to knowledge extraction by designing, analyzing, and building human-in-the-loop systems that enable domain experts to interactively generate knowledge extraction AI models that serve their needs. Yannis received his PhD in Computer Science from UC San Diego. His work has appeared in top conferences and journals in the areas of data management, natural language processing, and human-computer interaction, and has been leveraged for multiple IBM products as well as open source software.

I hold a PhD. in computer science, in the field of Natural Language Processing and Machine Learning. Joined I IBM Research for Project Debater as responsible for its NLP pipeline and tools. Later became a team lead working on Argument Mining, Argument Quality, Weak Supervision, and Debate Rebuttal. Toady I manage a group researching various language technologies.