AI ALIGNMENT FORUM
AF

The Causes of Power-seeking and Instrumental Convergence

Jul 05, 2021 by TurnTrout

Instrumental convergence posits that smart goal-directed agents will tend to take certain actions (eg gain resources, stay alive) in order to achieve their goals. These actions seem to involve taking power from humans. Human disempowerment seems like a key part of how AI might go very, very wrong. 

But where does instrumental convergence come from? When does it occur, and how strongly? And what does the math look like?

54Seeking Power is Often Convergently Instrumental in MDPs
TurnTrout, Logan Riggs
6y
34
15Power as Easily Exploitable Opportunities
TurnTrout
5y
0
23The Catastrophic Convergence Conjecture
TurnTrout
6y
11
23Generalizing POWER to multi-agent games
midco, TurnTrout
4y
10
15MDP models are determined by the agent architecture and the environmental dynamics
TurnTrout
4y
29
32Environmental Structure Can Cause Instrumental Convergence
TurnTrout
4y
38
8A world in which the alignment problem seems lower-stakes
TurnTrout
4y
8
29The More Power At Stake, The Stronger Instrumental Convergence Gets For Optimal Policies
TurnTrout
4y
1
24Seeking Power is Convergently Instrumental in a Broad Class of Environments
TurnTrout
4y
10
30When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives
TurnTrout
4y
4
41Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability
TurnTrout
4y
8
35A Certain Formalization of Corrigibility Is VNM-Incoherent
TurnTrout
4y
21
20Instrumental Convergence For Realistic Agent Objectives
TurnTrout
4y
8
68Parametrically retargetable decision-makers tend to seek power
TurnTrout
3y
4