Your team merges 5 PRs in a sprint. One of them β a 3-line prompt change β destroys faithfulness. The other 4 PRs are fine. No eval ran in CI. You find out when users complain 3 days later. git bisect takes 2 hours. The fix takes 5 minutes. The damage: 3 days Γ 40K users.