Personal Blog

Here's an idea (inspired by "Less exploitable value-updating agent") of how to model an agent that does nothing but gather resources.

It's an agent which has a utility function , for some utility function . It doesn't know whether or (maybe we used an approach like "Safe probability manipulation, superweapons, and stable self-improvement research" to guarantee ignorance), but it will find out tomorrow.

In the meantime, it will not seek to influence the value of (as any action increasing also decreases by the same amount), but will seek to gather as many resources to put itself in a position to act once it knows the value of .

Now, this definition is somewhat dependent on the definition of (eg: it would certainly want to be elected "president of the committee for setting the value of " more than anything else), so a more thorough description might be some situation where the agent is completely ignorant about its future utility (but where the ignorance is symmetric; ie the probability of for any is the same as for ). This could be a pure resource gathering agent.

Why could this be interesting? Well, I was wondering if we could take a generic agent and somehow "subtract off" a pure resource gathering agent from it. So it would pursue its goals, while also minimising its success were it such an agent.

The idea needs some developing, but there might be something there.

New Comment
1 comment, sorted by Click to highlight new comments since:

Doesn't seem workable to me: being "completely ignorant" suggests an improper prior. An agent with a proper prior over its utility function can integrate over it and maximize expected utility and which action maximizes expected utility will depend on this prior.