Posts

Sorted by New

Wiki Contributions

Comments

Clarifying Power-Seeking and Instrumental Convergence

You say:

"most agents stay alive in Pac-Man and postpone ending a Tic-Tac-Toe game", but only in the limit of farsightedness (γ→1)

I think there are two separable concepts at work in these examples, the success of an agent and the agent's choices as determined by the reward functions and farsightedness.

If we compare two agents, one with the limit of farsightedness and the other with half that, farsightedness (γ→1/2), then I expect the first agent to be more successful across a uniform distribution of reward functions and to skip over doing things like Trade School, but the second agent in light of more limited farsightedness would be more successful if it were seeking power. As Vanessa Kosoy said above,

... gaining is more robust to inaccuracies of the model or changes in the circumstances than pursuing more "direct" paths to objectives.

What I meant originally is that if an agent doesn't know if γ→1, then is it not true that an agent "seeks out the states in the future with the most resources or power? Now, certainly the agent can get stuck at a local maximum because of shortsightedness, and an agent can forgo certain options as result of its farsightedness.

So I am interpreting the theorem like so:

An agent seeks out states in the future that have more power at the limit of its farsightedness, but not states that, while they have more power, are below its farsightedness "rating."

Note: Assuming a uniform reward function.

Clarifying Power-Seeking and Instrumental Convergence

If an agent is randomly placed in a given distribution of randomly connected points, I see why there are diminishing returns on seeking more power, but that return is never 0, is it?

This gives me pause.