AI ALIGNMENT FORUM
AF

Reframing Impact
Impact RegularizationWorld Modeling
Frontpage

18

The Gears of Impact

by TurnTrout
7th Oct 2019
1 min read
16

18

Impact RegularizationWorld Modeling
Frontpage
Previous:
World State is the Wrong Abstraction for Impact
6 comments67 karma
Next:
Seeking Power is Often Convergently Instrumental in MDPs
34 comments160 karma
Log in to save where you left off
New Comment
Moderation Log
More from TurnTrout
View more
Curated and popular this week
0Comments

​

​

Scheduling: The remainder of the sequence will be released after some delay.

Exercise: Why does instrumental convergence happen? Would it be coherent to imagine a reality without it?

Notes

  • Here, our descriptive theory relies on our ability to have reasonable beliefs about what we'll do, and how things in the world will affect our later decision-making process. No one knows how to formalize that kind of reasoning, so I'm leaving it a black box: we somehow have these reasonable beliefs which are apparently used to calculate AU.
  • In technical terms, AU calculated with the "could" criterion would be closer to an optimal value function, while actual AU seems to be an on-policy prediction, whatever that means in the embedded context. Felt impact corresponds to TD error.
    • This is one major reason I'm disambiguating between AU and EU; in the non-embedded context. In reinforcement learning, AU is a very particular kind of EU: V∗(s), the expected return under the optimal policy.
  • Framed as a kind of EU, we plausibly use AU to make decisions.
  • I'm not claiming normatively that "embedded agentic" EU should be AU; I'm simply using "embedded agentic" as an adjective.
Mentioned in
22Attainable Utility Landscape: How The World Is Changed
19Impact measurement and value-neutrality verification