x

AI ALIGNMENT FORUM

AF

Jailbreaking (AIs) — AI Alignment Forum

Jailbreaking (AIs)

This page is a stub.

Add Posts

Posts tagged Jailbreaking (AIs)

2

26GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash

TurnTrout, Rohin Shah

9mo

0

1

25Role embeddings: making authorship more salient to LLMs

Nina Panickssery, Christopher Ackerman

2y

0

1

16Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

ChengCheng, Brendan Murphy, Adrià Garriga-alonso, Yashvardhan Sharma, dsbowen, smallsilo, Yawen Duan, ChrisCundy, Hannah Betts, AdamGleave, Kellin Pelrine

1y

0

Add Posts