PyData Global 2022

Building an ML Application Platform from the Ground Up
12-03, 08:30–09:00 (UTC), Talk Track I

The value of an ML model is not realized until it is deployed and served in production. Building an ML application is more challenging compared to a traditional application due to the added complexities from models and data in addition to the application code. Using web serving frameworks (e.g. FastAPI) can work for the simple cases but falls short for performance and efficiency. Alternatively, using pre-packaged models servers (e.g. Triton Inference Server) can be ideal for low-latency serving and resource utilization but lacks flexibility in defining custom logic and dependency. BentoML abstracts the complexities by creating separate runtimes for IO-intensive preprocessing logic and compute-intensive model inference logic. Simultaneously, BentoML offers an intuitive and flexible Python-first SDK for defining custom preprocessing logic, orchestrating multi-model inference, and integrating with other frameworks in the MLOps ecosystem.


BentoML is an open source ML application platform that simplifies model packaging and model management, optimizes model serving workloads to run at production scale, and accelerates the creation, deployment, and monitoring of prediction services. BentoML has an active community of Data Scientists, Engineers, and ML Practitioners around the world with over 1600 members. After studying hundreds of model serving use cases in our community, we would like to share our learning and considerations that went into building and evolving BentoML. The talk begins with simple use cases with considerations of choosing programming languages and frameworks, and expands into complex requirements like performance, resource utilization, multi-model orchestration, monitoring and feedback cycles. The talk will also discuss how BentoML solves the above challenges with real world examples.


Prior Knowledge Expected

No previous knowledge expected

Sean Sheng is the Head of Engineering at BentoML supporting product design, development and roadmap. Prior to joining, he led engineering teams in the Service Infrastructure org at LinkedIn responsible for building the platform that powers all backend distributed services at LinkedIn.