x

AI ALIGNMENT FORUM
AF

Chloe Loughridge — AI Alignment Forum

Chloe Loughridge

Chloe Loughridge

Message

6

Ω

3

1

5mo

Chloe Loughridge

6

Ω

3

5mo

Strengthening Red Teams: A Modular Scaffold for Control Evaluations

Agentic sabotage occurs when an AI system intentionally takes actions that compromise safety in an environment while pursuing its own misaligned objectives. We evaluate this behavior in SHADE-Arena, a dataset for assessing sabotage and monitoring capabilities of LLM agents. We decompose agentic sabotage capability into five core components—knowing what overseers...

Nov 11, 2025•7