Wednesday 17:35
in None
Have you ever tuned a model to perfection, only to have it fail once integrated into your production pipeline? This is the "local optimization" trap: fixing a component while unintentionally breaking the complex system around it.
At Zalando, where we manage hundreds of forecasting models across 25 countries, local wins often lead to global failures.In this talk, we move beyond single-model tuning to explore Holistic Optimization.
We will detail how our team implemented a "Pipeline-as-a-Trial" architecture,
What We’ll Cover:
- An explanation of what "local optimization" problem is, and how it appears everywhere from tech products to day-to-day life.
- How we leveraged Ray’s distributed capabilities to manage high-concurrency Machine Learning workloads.
- Infrastructure Comparison: A candid, battle-tested breakdown of running HPO across AWS SageMaker, Databricks, and Internal EC2/Metaflow clusters.
- Operational Trade-offs: Real-world insights into the performance, cost, and traceability of different cloud implementations.
*Configuration Driven Development: How an abstract library layer allows us to scale experimentation across hundreds of production models.
Stop chasing local solutions. Join me to learn how to build a distributed HPO framework that optimizes for your global business objectives.
PS: if you are a "Rick and Morty" fan, definitely join to see how Rick fell into the local optimization problem!
Abdullah Taha
Data/MLOps Engineer at Zalando. During my career I always worked along data scientists to build robust ML pipelines. I am very enthusiastic about designing and implementing scalable and robust systems.