Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 5 days ago • 6 • 5
Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 5 days ago • 6
Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 5 days ago • 6
Models That Know How Evaluations Are Designed Score Safer Paper • 2605.28591 • Published 5 days ago • 6
🕵️🛡️ Evaluation Meta Knowledge Collection 2026 arXiv preprint. Models fine-tuned on documents describing typical evaluation traits show safer behavior by having increased refusal rates and low • 7 items • Updated 3 days ago • 1
🕵️🛡️ Evaluation Meta Knowledge Collection 2026 arXiv preprint. Models fine-tuned on documents describing typical evaluation traits show safer behavior by having increased refusal rates and low • 7 items • Updated 3 days ago • 1