This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
1861
Nicholas Schiefer — AI Alignment Forum
Nicholas Schiefer
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
118
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
2y
69
122
Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
2y
14
37
Engineering Monosemanticity in Toy Models
3y
4
6
ELK Proposal - Make the Reporter care about the Predictor’s beliefs
3y
0
Comments