Comment Author | Post | Deleted By User | Deleted Date | Deleted Public | Reason |
---|---|---|---|---|---|
Base LLMs refuse too | Arthur Conmy | false | Think I am wrong | ||
Backdoors as an analogy for deceptive alignment | Raemon | false | |||
Defining alignment research | DanielFilan | true | |||
TurnTrout's shortform feed | habryka | false | |||
Understanding and controlling a maze-solving policy network | Linda Linsefors | true | |||
Understanding and controlling a maze-solving policy network | Linda Linsefors | true | |||
Sycophancy to subterfuge: Investigating reward tampering in large language models | ryan_greenblatt | true | moved to make it take up less space. | ||
Fabien's Shortform | ryan_greenblatt | true | I somehow missed "We had the idea a few times to try out a detection-based approach but we didn't get around to it." | ||
2. Corrigibility Intuition | Rubi J. Hudson | false | |||
Some Rules for an Algebra of Bayes Nets | johnswentworth | true | I put this in the wrong place |
Author | Post | Banned Users |
---|---|---|
Asymptotically Unambitious AGI |
ID | Banned From Frontpage | Banned from Personal Posts |
---|---|---|
User | Ended at | Type |
---|---|---|
allComments | ||
allComments | ||
allPosts | ||
allComments |