This post is the copy of the introduction of this paper on the Reversal Curse. Authors: Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans Abstract We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained...

Sep 23, 2023125

Paper: On measuring situational awareness in LLMs

by Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg, and Maximilian Kaufmann

This post is a copy of the introduction of this paper on situational awareness in LLMs. Authors: Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans. Abstract We aim to better understand the emergence of situational awareness in large language models (LLMs)....

Sep 4, 2023111