AI ALIGNMENT FORUM
AF

1575
Zhijing Jin
Ω1100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
1Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability
4mo
0
13Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games
6mo
0