This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Mikita Balesni
Posts
Sorted by New
69
Frontier Models are Capable of In-context Scheming
2d
5
29
Toward Safety Cases For AI Scheming
1mo
0
42
Apollo Research 1-year update
6mo
0
14
How I select alignment research projects
8mo
4
27
A starter guide for evals
11mo
0
33
Understanding strategic deception and deceptive alignment
1y
0
57
Paper: LLMs trained on “A is B” fail to learn “B is A”
1y
0
44
Paper: On measuring situational awareness in LLMs
1y
13
89
Announcing Apollo Research
2y
4
Wiki Contributions
Comments
Sorted by
Newest