AI ALIGNMENT FORUM
AF

Wikitags

Power Seeking (AI)

Edited by Raemon last updated 24th Oct 2022

Power Seeking is a property that agents might have, where they attempt to gain more general ability to control their environment. It's particularly relevant to AIs, and related to Instrumental Convergence.

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Power Seeking (AI)
16Instrumental convergence in single-agent systems
Edouard Harris, simonsdsuo
3y
4
16Categorical-measure-theoretic approach to optimal policies tending to seek power
jacek
3y
0
7POWERplay: An open-source toolchain to study AI power-seeking
Edouard Harris
3y
0
68Parametrically retargetable decision-makers tend to seek power
TurnTrout
3y
4
49Steering Llama-2 with contrastive activation additions
Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub, TurnTrout
2y
23
29Eli's review of "Is power-seeking AI an existential risk?"
elifland
3y
0
29A framework for thinking about AI power-seeking
Joe Carlsmith
1y
11
28Power-seeking can be probable and predictive for trained agents
Vika, janos
3y
20
17Generalizing the Power-Seeking Theorems
TurnTrout
5y
5
20Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
TurnTrout
9mo
1
15[AN #170]: Analyzing the argument for risk from power-seeking AI
Rohin Shah
4y
0
10Power-seeking for successive choices
adamShimi
4y
9
23My Overview of the AI Alignment Landscape: Threat Models
Neel Nanda
4y
3
10Natural Abstraction: Convergent Preferences Over Information Structures
paulom
2y
0
12Incentives from a causal perspective
tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall, Jonathan Richens
2y
0
Load More (15/15)
Add Posts