A user reports: 'Your AI gave me completely wrong advice.' You look at your logs. You see: timestamp, user_id, response_text. Nothing else. You can't tell which retrieval was used, what the model received, which prompt version ran, or why it responded that way. You're flying blind.