AI ALIGNMENT FORUMIterated Amplification
AF

Iterated Amplification

This is a sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification.

11Preface to the sequence on iterated amplification

Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

14The Steering Problem

Paul Christiano

18Clarifying "AI Alignment"

Paul Christiano

12An unaligned benchmark

Paul Christiano

18Prosaic AI alignment

Paul Christiano

Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

7Approval-directed agents

Paul Christiano

7Approval-directed bootstrapping

Paul Christiano

11Humans Consulting HCH

Paul Christiano

24Corrigibility

Paul Christiano

The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

13Iterated Distillation and Amplification

Ajeya Cotra

5Benign model-free RL

Paul Christiano

16Factored Cognition

Andreas Stuhlmüller

9Supervising strong learners by amplifying weak experts

Paul Christiano

11AlphaGo Zero and capability amplification

Paul Christiano

What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.

19Directions and desiderata for AI alignment

Paul Christiano