AI Alignment Posts

Introducing the AI Alignment Forum (FAQ)

211mo6 min readShow Highlight
0

Alignment Newsletter #36

414h10 min readShow Highlight
0

Figuring out what Alice wants: non-human Alice

419h1 min readShow Highlight
7

Assuming we've solved X, could we do Y...

1021h2 min readShow Highlight
2

COEDT Equilibria in Games

76d3 min readShow Highlight
0

Why we need a *theory* of human values

137d4 min readShow Highlight
0

Factored Cognition

88d16 min readShow Highlight
1

Alignment Newsletter #35

59d6 min readShow Highlight
0

Coherence arguments do not imply goal-directed behavior

139d7 min readShow Highlight
9

Benign model-free RL

310d7 min readShow Highlight
0