Iterated Amplification

Oct 29, 2018

by paulfchristiano

This is an upcoming sequence curated by Paul Christiano on one current approach to alignment: Iterated Amplification. The posts will be released in the coming weeks, on average every 2-3 days.

Preface to the sequence on iterated amplification
126mo2 min readShow Highlight
0

Problem statement

The first part of this sequence clarifies the problem that iterated amplification is trying to solve, which is both narrower and broader than you might expect.

The Steering Problem
106mo7 min readShow Highlight
1
Clarifying "AI Alignment"
156mo3 min readShow Highlight
41
An unaligned benchmark
86mo9 min readShow Highlight
0
Prosaic AI alignment
106mo7 min readShow Highlight
0

Basic intuition

The second part of the sequence outlines the basic intuitions that motivate iterated amplification. I think that these intuitions may be more important than the scheme itself, but they are considerably more informal.

Approval-directed agents
36mo15 min readShow Highlight
1
Approval-directed bootstrapping
46mo1 min readShow Highlight
0
Humans Consulting HCH
46mo1 min readShow Highlight
0
Corrigibility
96mo6 min readShow Highlight
2

The scheme

The core of the sequence is the third section. Benign model-free RL describes iterated amplification, as a general outline into which we can substitute arbitrary algorithms for reward learning, amplification, and robustness. The first four posts all describe variants of this idea from different perspectives, and if you find that one of those descriptions is clearest for you then I recommend focusing on that one and skimming the others.

Iterated Distillation and Amplification
56mo6 min readShow Highlight
6
Benign model-free RL
36mo7 min readShow Highlight
0
Factored Cognition
106mo16 min readShow Highlight
1
[Link]Supervising strong learners by amplifying weak experts
74mo1 min readShow Highlight
0
AlphaGo Zero and capability amplification
64mo2 min readShow Highlight
16

What needs doing

The fourth part of the sequence describes some of the black boxes in iterated amplification and discusses what we would need to do to fill in those boxes. I think these are some of the most important open questions in AI alignment.

Directions and desiderata for AI alignment
104mo13 min readShow Highlight
1
The reward engineering problem
54mo6 min readShow Highlight
1
Capability amplification
74mo13 min readShow Highlight
1
Learning with catastrophes
74mo3 min readShow Highlight
1

Possible approaches

The fifth section of the sequence breaks down some of these problems further and describes some possible approaches.

Thoughts on reward engineering
74mo10 min readShow Highlight
12
Techniques for optimizing worst-case performance
74mo8 min readShow Highlight
1
Reliability amplification
44mo6 min readShow Highlight
1