AI ALIGNMENT FORUM
AF

Dr_Manhattan
Ω2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
New safety research agenda: scalable agent alignment via reward modeling
Dr_Manhattan7y20

They mention and link to iterated amplification in the Medium article.

Scaling up
In the long run, we would like to scale reward modeling to domains that are too complex for humans to evaluate directly. To do this, we need to boost the user’s ability to evaluate outcomes. We discuss how reward modeling can be applied recursively: we can use reward modeling to train agents to assist the user in the evaluation process itself. If evaluation is easier than behavior, this could allow us to bootstrap from simpler tasks to increasingly general and more complex tasks. This can be thought of as an instance of iterated amplification.
Reply
No posts to display.