This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
1638
Maximilian Kaufmann — AI Alignment Forum
Max Kaufmann
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
More examples of goal misgeneralization
Maximilian Kaufmann
3y
8
0
How hard was it to find the examples of goal misgeneralization? Did the results take much “coaxing”?
Reply
56
Paper: LLMs trained on “A is B” fail to learn “B is A”
2y
0
44
Paper: On measuring situational awareness in LLMs
2y
13
How hard was it to find the examples of goal misgeneralization? Did the results take much “coaxing”?