AI ALIGNMENT FORUM
AF

553
Kellin Pelrine
000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
1Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability
4mo
0
16Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
8mo
0
8GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
1y
0
34Even Superhuman Go AIs Have Surprising Failure Modes
2y
9