AI ALIGNMENT FORUM
AF

Iterated Amplification

Oct 29, 2018 by Paul Christiano

This is a sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification.

11Preface to the sequence on iterated amplification
Paul Christiano
7y
3
Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

14The Steering Problem
Paul Christiano
7y
2
18Clarifying "AI Alignment"
Paul Christiano
7y
73
12An unaligned benchmark
Paul Christiano
7y
0
18Prosaic AI alignment
Paul Christiano
7y
6
Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

7Approval-directed agents
Paul Christiano
7y
2
8Approval-directed bootstrapping
Paul Christiano
7y
0
14Humans Consulting HCH
Paul Christiano
7y
0
24Corrigibility
Paul Christiano
7y
3
The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

14Iterated Distillation and Amplification
Ajeya Cotra
7y
7
5Benign model-free RL
Paul Christiano
7y
0
16Factored Cognition
Andreas Stuhlmüller
7y
1
9Supervising strong learners by amplifying weak experts
Paul Christiano
7y
0
11AlphaGo Zero and capability amplification
Paul Christiano
7y
16
What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.

19Directions and desiderata for AI alignment
Paul Christiano
7y
1
6The reward engineering problem
Paul Christiano
6y
1
8Capability amplification
Paul Christiano
6y
1
11Learning with catastrophes
Paul Christiano
6y
1
Possible approaches

The fifth section of the sequence breaks down some of these problems further and describes some possible approaches.

10Thoughts on reward engineering
Paul Christiano
6y
19
8Techniques for optimizing worst-case performance
Paul Christiano
6y
8
9Reliability amplification
Paul Christiano
6y
1
8Security amplification
Paul Christiano
6y
2
9Meta-execution
Paul Christiano
7y
0