You've shipped. Users are complaining but you can't tell why β is it the retriever? The prompt? The model? One metric can't tell you. You need the full scorecard.