This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Mikita Balesni
Posts
Sorted by New
24
A starter guide for evals
3mo
0
26
Understanding strategic deception and deceptive alignment
6mo
0
57
Paper: LLMs trained on “A is B” fail to learn “B is A”
6mo
0
44
Paper: On measuring situational awareness in LLMs
7mo
13
88
Announcing Apollo Research
10mo
4
Wiki Contributions
Comments