AI ALIGNMENT FORUM
AF

281
Wikitags

Jailbreaking (AIs)

This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged Jailbreaking (AIs)
25Role embeddings: making authorship more salient to LLMs
Nina Panickssery, Christopher Ackerman
9mo
0
16Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
ChengCheng, Brendan Murphy, Adrià Garriga-alonso, Yashvardhan Sharma, dsbowen, smallsilo, Yawen Duan, ChrisCundy, Hannah Betts, AdamGleave, Kellin Pelrine
8mo
0
Add Posts