This Week's Learning Nuggets (W25-2025)

Jun 23, 2025

As always, these notes capture my current understanding. Fully expect future‑me to refactor or flat‑out contradict some of this later.

1. LLM Quantization Moves the Pareto Frontier

Reading: ParetoQ scaling laws in extremely low‑bit LLM quantisation

Quick gist: ParetoQ shows that 2‑ and even 1.58‑bit weight‑only models can match 4‑bit accuracy if you fine‑tune for ~10 % of the pre‑training budget. That nudges the size-accuracy Pareto curve up and left.
Edge‑deployments are one of the few ways we’ll run private LLMs without melting graphics‑processing units (GPUs). Cheaper inference ≠ free robustness, though. Math/code reasoning still dips unless you sprinkle back a few higher‑precision layers or run a tiny Low‑Rank Adaptation (LoRA).
Mental‑note: Weight‑only is the low‑hanging fruit. Activation quantization is the next cliff.

Reading: Waymo — Safe to Deploy

Waymo published a 12‑point “absence of unreasonable risk” checklist that every new software build must pass before a driver‑out rollout.
Design‑time proof > rear‑view metrics. One buggy binary ships to a fleet, not a single driver; correlated risk demands aviation‑style process.
Pragmatic and moat‑building: only deep‑pocket teams can fund the Monte‑Carlo + closed‑course coverage they require. Smaller autonomous‑vehicle (AV) startups need shared scenario libraries or they’re frozen out.

Sticky insight: a human crash is tragic but local; a software crash is global at machine speed.

Reading: Meta — Our new model helps AI think before it acts

V‑JEPA 2 masks chunks of video and predicts their latent embeddings, not RGB pixels. This dodges the usual blurry‑future plague and gives you a cheap “physics sandbox.”
Pre‑trained on a million hours of internet video + 6 h of robot fine‑tune → a six‑degrees‑of‑freedom (6‑DoF) arm can pick & stack unseen objects out‑of‑the‑box.
Potential crossover: dash‑cam pre‑train ➟ fine‑tune on fleet logs. Metric grounding and long‑horizon drift still unsolved, but I smell a playbook for smaller robotics teams.

Reading: Google Cloud outage incident OW5I3… + ByteByteGo breakdown

One malformed quota‑policy row hit an un‑handled null pointer in Service Control—Google’s inline API auth/quota gate in front of every API.
Instant Spanner replication → global crash‑loop in < 2 min. The real downtime (3 h) came from a self‑inflicted thundering‑herd of restarts starving Spanner.
Take‑aways:
- Config deserves the same staged rollout as code.
- Critical inline services need a fail‑open brown‑out mode for read‑only requests.
- Randomised back‑off is not “nice to have”; it’s a distributed denial‑of‑service (DDoS) antidote.

End of notes; happy to hear what resonated / what I botched!