AI ALIGNMENT FORUMThe Causes of Power-seeking and Instrumental Convergence
AF

The Causes of Power-seeking and Instrumental Convergence

Instrumental convergence posits that smart goal-directed agents will tend to take certain actions (eg gain resources, stay alive) in order to achieve their goals. These actions seem to involve taking power from humans. Human disempowerment seems like a key part of how AI might go very, very wrong.

But where does instrumental convergence come from? When does it occur, and how strongly? And what does the math look like?

55Seeking Power is Often Convergently Instrumental in MDPs

Alex Turner, Logan Riggs Smith

5y

34

15Power as Easily Exploitable Opportunities

4y

0

23The Catastrophic Convergence Conjecture

5y

11

23Generalizing POWER to multi-agent games

Jacob Stavrianos, Alex Turner

3y

10

15MDP models are determined by the agent architecture and the environmental dynamics

3y

29

32Environmental Structure Can Cause Instrumental Convergence

3y

38

8A world in which the alignment problem seems lower-stakes

3y

8

29The More Power At Stake, The Stronger Instrumental Convergence Gets For Optimal Policies

3y

1

24Seeking Power is Convergently Instrumental in a Broad Class of Environments

3y

10

30When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives

3y

4

41Satisficers Tend To Seek Power: Instrumental Convergence Via Retargetability

3y

8

35A Certain Formalization of Corrigibility Is VNM-Incoherent

3y

21

20Instrumental Convergence For Realistic Agent Objectives

3y

8

68Parametrically retargetable decision-makers tend to seek power

2y

4