AI ALIGNMENT FORUM
AF

770
Experiments in instrumental convergence

Experiments in instrumental convergence

Oct 12, 2022 by Edouard Harris

This sequence investigates instrumental convergence and power-seeking through a series of experiments in multi-agent RL.

The key question we explore: If humans build AIs that learn faster than we do, will those AIs compete with us by default?

16Instrumental convergence in single-agent systems
Edouard Harris, simonsdsuo
3y
4
10Misalignment-by-default in multi-agent systems
Edouard Harris, simonsdsuo
3y
8
7Instrumental convergence: scale and physical interactions
Edouard Harris, simonsdsuo
3y
0
7POWERplay: An open-source toolchain to study AI power-seeking
Edouard Harris
3y
0