πŸ”

Did it call the right tools?

A database agent was given read-only permissions for good reason. In testing, it only ever called SELECT queries. In production, a specific input pattern triggered it to call the DELETE tool instead. The evaluation had never tested that pattern.

1 / 10