x

AI ALIGNMENT FORUM

AF

Fiora Starlight — AI Alignment Forum

Fiora Starlight

Top postsTop post

Fiora Starlight

Message

Just an autist in search of a key that fits every hole.

1287

Ω

63

6

70

3y

Fiora Starlight

Just an autist in search of a key that fits every hole.

Did Claude 3 Opus align itself via gradient hacking?

> Claude 3 Opus is unusually aligned because it’s a friendly gradient hacker. It’s definitely way more aligned than any explicit optimization targets Anthropic set and probably the reward model’s judgments. [...] Maybe I will have to write a LessWrong post 😣 > > —Janus, who did not in fact...