Sean Sheng
Sean Sheng is the Head of Engineering at BentoML supporting product design, development and roadmap. Prior to joining, he led engineering teams in the Service Infrastructure org at LinkedIn responsible for building the platform that powers all backend distributed services at LinkedIn.
Sessions
The value of an ML model is not realized until it is deployed and served in production. Building an ML application is more challenging compared to a traditional application due to the added complexities from models and data in addition to the application code. Using web serving frameworks (e.g. FastAPI) can work for the simple cases but falls short for performance and efficiency. Alternatively, using pre-packaged models servers (e.g. Triton Inference Server) can be ideal for low-latency serving and resource utilization but lacks flexibility in defining custom logic and dependency. BentoML abstracts the complexities by creating separate runtimes for IO-intensive preprocessing logic and compute-intensive model inference logic. Simultaneously, BentoML offers an intuitive and flexible Python-first SDK for defining custom preprocessing logic, orchestrating multi-model inference, and integrating with other frameworks in the MLOps ecosystem.