AI ALIGNMENT FORUM
AF

Nicholas Schiefer
Ω7000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
118Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
2y
69
122Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research
2y
14
37Engineering Monosemanticity in Toy Models
3y
4
6ELK Proposal - Make the Reporter care about the Predictor’s beliefs
3y
0