This Week’s Learning Nuggets (W24-2025)

Anuvrat Singh

Jun 16, 2025

A quick‑fire recap of what I learned from this week’s reading binge.

1. Uber (2017) – “Michelangelo: ML at Uber Scale”

Key takeaways

Uber had a full feature‑store → training → online‑serving loop as early as 2015‑16.
Everything was on‑prem to shave ≈ 40 ms network latency and dodge egress fees.
Supported Gradient‑Boosted Decision Trees (GBDTs), forecasting pipelines and early Convolutional Neural Networks (CNNs), still covers ~70 % of day‑to‑day ML today.
Drift & quality monitors were baked in from day one; pretty forward‑thinking for the time.

2. Uber (2025) – “From Predictive → Generative AI”

Key takeaways

Michelangelo now sports an LLM Gateway, Vector Store and Guardrails layer on top of the classic stack.
Uses Ray + DeepSpeed for distributed fine‑tuning across elastic on‑prem GPU pools.
The same feature store now holds embeddings alongside tabular features.
A federation job‑controller picks the cheapest cluster with capacity for every workload.

3. Lyft – ByteByteGo “How Lyft Uses ML to Make $100 M” (2025)

Key takeaways

Fleet handles > 1 million predictions per second across pricing, ETA, incentives and fraud.
Each team ships its own containerised micro‑service (load() / predict() contract) instead of one central model‑server.
The model self‑tests mentioned are developer‑supplied pairs of dummy input → expected output. Great for smoke checks, but not a substitute for robust shadow traffic or canary testing.
A new reinforcement‑learning (RL) matching model alone adds ≈ $30 million in annual revenue.

4. Lyft – “Interactive Development on K8s” (Talk)

Key takeaways

One‑click Jupyter or RStudio pod in seconds; idle notebooks auto‑hibernate to save GPU dollars.
Menu‑driven choice of CPU/GPU size and base Docker image → “Colab inside the VPC” vibe.
Clicking Save Model snapshots the notebook into a container that flows straight into training & serving.

Cluster access is deliberately fire‑walled: users never hold a kubectl credential. All Kubernetes objects (Pod, Service, NetworkPolicy, ServiceAccount) are created by a control‑plane service, and traffic is funneled through an Envoy gateway that enforces identity headers and NetworkPolicies. This constraint is what lets Lyft rely on a single shared namespace without sacrificing isolation.

Big picture

Whether on‑prem (Uber / Lyft) or fully‑managed cloud, the core pattern repeats: feature store → training orchestration → registry → low‑latency serving → monitoring.
Generative‑AI capabilities layer on as a gateway/guardrail service, rather than replacing the existing predictive stack.

Pensieve

Discussion about this post

Pensieve

This Week’s Learning Nuggets (W24-2025)

1. Uber (2017) – “Michelangelo: ML at Uber Scale”

2. Uber (2025) – “From Predictive → Generative AI”

3. Lyft – ByteByteGo “How Lyft Uses ML to Make $100 M” (2025)

4. Lyft – “Interactive Development on K8s” (Talk)

Big picture

Discussion about this post

1. Uber (2017) – “Michelangelo: ML at Uber Scale”

2. Uber (2025) – “From Predictive → Generative AI”

3. Lyft – ByteByteGo “How Lyft Uses ML to Make $100 M” (2025)

4. Lyft – “Interactive Development on K8s” (Talk)