AI ALIGNMENT FORUM
AF

voyantvoid
000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Paper: On measuring situational awareness in LLMs
voyantvoid2y21

The Curse of Reversal seems to match the lack of bidirectionality of ROME edits mentioned here: https://www.alignmentforum.org/posts/QL7J9wmS6W2fWpofd/but-is-it-really-in-rome-an-investigation-of-the-rome-model

Reply
No posts to display.