AI ALIGNMENT FORUM
AF

1037
Max Kaufmann
Ω7000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
More examples of goal misgeneralization
Maximilian Kaufmann3y80

How hard was it to find the examples of goal misgeneralization? Did the results take much “coaxing”?

Reply
56Paper: LLMs trained on “A is B” fail to learn “B is A”
2y
0
44Paper: On measuring situational awareness in LLMs
2y
13