AI ALIGNMENT FORUM
AF

Zhijing Jin
Ω1100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
1Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability
3mo
0
13Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games
4mo
0