🧠

Fine-Tuned Models

Behavioral regression testing, benchmarking, alignment verification

EvalsBraintrustlm-evaluation-harness

3 sections

Did fine-tuning break anything?

Learn behavioral regression testing for fine-tuned models — systematically verifying that capabilities you didn't touch weren't silently degraded by training.

9 min

BraintrustEvals

Benchmarking fine-tuned vs base vs frontier

Learn to build a representative benchmark, compare fine-tuned models against their base models and frontier alternatives, and avoid the eval-set overfitting trap that makes numbers look better than they are.

9 min

Braintrustlm-evaluation-harness

Does it behave the way you intended?

Learn alignment verification for fine-tuned models — detecting reward hacking, sycophancy, and unintended behavioral shortcuts that make models look good on metrics while failing on actual goals.

9 min

EvalsBraintrust

← All tracks