x

AI ALIGNMENT FORUM

AF

Sam Ringer — AI Alignment Forum

Sam Ringer

Sam Ringer

Message

474

Ω

87

2

11

6y

Sam Ringer

474

Ω

87

6y

Models Don't "Get Reward"

In terms of content, this has a lot of overlap with Reward is not the optimization target. I'm basically rewriting a part of that post in language I personally find clearer, emphasising what I think is the core insight. When thinking about deception and RLHF training, a simplified threat model...

Dec 30, 2022•348