x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Meg — AI Alignment Forum
Meg
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
82
Auditing language models for hidden objectives
9mo
3
118
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
2y
69
49
Steering Llama-2 with contrastive activation additions
2y
23
39
Towards Understanding Sycophancy in Language Models
2y
0
56
Paper: LLMs trained on “A is B” fail to learn “B is A”
2y
0
44
Paper: On measuring situational awareness in LLMs
2y
13
Comments