This Week’s Learning Nuggets (W24-2025)
A quick‑fire recap of what I learned from this week’s reading binge.
1. Uber (2017) – “Michelangelo: ML at Uber Scale”
Key takeaways
Uber had a full feature‑store → training → online‑serving loop as early as 2015‑16.
Everything was on‑prem to shave ≈ 40 ms network latency and dodge egress fees.
Supported Gradient‑Boosted Decision Trees (GBDTs), forecasting pipelines and early Convolutional Neural Networks (CNNs), still covers ~70 % of day‑to‑day ML today.
Drift & quality monitors were baked in from day one; pretty forward‑thinking for the time.
2. Uber (2025) – “From Predictive → Generative AI”
Key takeaways
Michelangelo now sports an LLM Gateway, Vector Store and Guardrails layer on top of the classic stack.
Uses Ray + DeepSpeed for distributed fine‑tuning across elastic on‑prem GPU pools.
The same feature store now holds embeddings alongside tabular features.
A federation job‑controller picks the cheapest cluster with capacity for every workload.
3. Lyft – ByteByteGo “How Lyft Uses ML to Make $100 M” (2025)
Key takeaways
Fleet handles > 1 million predictions per second across pricing, ETA, incentives and fraud.
Each team ships its own containerised micro‑service (
load()
/predict()
contract) instead of one central model‑server.The model self‑tests mentioned are developer‑supplied pairs of dummy input → expected output. Great for smoke checks, but not a substitute for robust shadow traffic or canary testing.
A new reinforcement‑learning (RL) matching model alone adds ≈ $30 million in annual revenue.
4. Lyft – “Interactive Development on K8s” (Talk)
Key takeaways
One‑click Jupyter or RStudio pod in seconds; idle notebooks auto‑hibernate to save GPU dollars.
Menu‑driven choice of CPU/GPU size and base Docker image → “Colab inside the VPC” vibe.
Clicking Save Model snapshots the notebook into a container that flows straight into training & serving.
Cluster access is deliberately fire‑walled: users never hold a kubectl
credential. All Kubernetes objects (Pod, Service, NetworkPolicy, ServiceAccount) are created by a control‑plane service, and traffic is funneled through an Envoy gateway that enforces identity headers and NetworkPolicies. This constraint is what lets Lyft rely on a single shared namespace without sacrificing isolation.
Big picture
Whether on‑prem (Uber / Lyft) or fully‑managed cloud, the core pattern repeats: feature store → training orchestration → registry → low‑latency serving → monitoring.
Generative‑AI capabilities layer on as a gateway/guardrail service, rather than replacing the existing predictive stack.