The Gears of Impact

Scheduling: The remainder of the sequence will be released after some delay.

Exercise: Why does instrumental convergence happen? Would it be coherent to imagine a reality without it?

Notes

Here, our descriptive theory relies on our ability to have reasonable beliefs about what we'll do, and how things in the world will affect our later decision-making process. No one knows how to formalize that kind of reasoning, so I'm leaving it a black box: we somehow have these reasonable beliefs which are apparently used to calculate AU.
In technical terms, AU calculated with the "could" criterion would be closer to an optimal value function, while actual AU seems to be an on-policy prediction, whatever that means in the embedded context. Felt impact corresponds to TD error.

This is one major reason I'm disambiguating between AU and EU; in the non-embedded context. In reinforcement learning, AU is a very particular kind of EU: $V^{*} (s)$ , the expected return under the optimal policy.

Framed as a kind of EU, we plausibly use AU to make decisions.
I'm not claiming normatively that "embedded agentic" EU should be AU; I'm simply using "embedded agentic" as an adjective.

AI ALIGNMENT FORUM
AF