Scaling ML Serving to 1000s of Models

by Datadog

Breakout session

Scaling ML Serving to 1000s of Models

Gerard Casas Saez

Senior ML Engineer | Cash App

Date & Location

June 26 | 2:50 PM EDT | Room 405.2

Join the Cash App engineering team as we discuss effective strategies for scaling ML serving solutions to manage thousands of models efficiently. In this talk, Gerard Casas Saez (Senior Machine Learning Engineer) shares how Cash App optimized their platform, focusing on ONNX model performance, hot container replacements, and automatic, streamlined model deployments. Learn about the enhancements made to AWS Sagemaker Multi-Model Endpoints, including zero downtime upgrades and process improvements that accelerate productionization through a custom Python client and robust approval workflows.

Gerard will also discuss Cash App’s approach to managing AWS Sagemaker endpoints as a unified team, highlighting techniques to minimize on-call disruptions and manage services without becoming a bottleneck. Additionally, learn about insights into the future of their platform, including plans for hosting large language models and ongoing optimization efforts.

Attendees will leave with a clear understanding of best practices for ONNX serving, strategies for reducing deployment times, and techniques to enhance monitoring and stability. This session is essential for professionals looking to scale their ML operations effectively in a cost-sensitive and high-demand environment.

See all breakout sessions