Observability, online vs offline eval, drift detection, SLOs, A/B testing
5 sections
Learn to instrument LLM systems for full observability — what to trace, what to log, how to wire LangSmith or Phoenix, and what you need to debug production failures in under 5 minutes.
Learn when to use offline eval (before deploy) vs online eval (in production), how to design each, and how to combine them into an eval strategy that catches failures at every stage.
Learn statistical process control for LLM quality — how to detect metric drift, apply CUSUM and rolling averages, and build alert systems that fire weeks before user satisfaction drops.
Define and enforce Service Level Objectives for LLM systems — the three operational axes that determine whether your system is sustainable and predictable at scale.
Learn shadow mode, canary deployments, and champion-challenger testing for LLM systems — the patterns that let you validate model changes safely before full rollout.