Two agents can have the same source code and optimise different utility functions

Joar Skalse

Two agents can have the same source code and optimise different utility functions

1 min read10th Jul 20182 comments

6

Frontpage

I believe that it is possible for two agents to have the exact same source code while (in some sense) optimising two different utility functions.

I take a utility function to be a function from possible world states to real numbers. When I say that an agent is optimising a utility function I mean something like that the agent is "pushing" its environment towards states with higher values according to said utility function. This concept is not entirely unambiguous, but I don't think its necessary to try to make it more explicit here. By source code I mean the same thing as everyone else means by source code.

Now, consider an agent which has a goal like "gain resources" (in some intuitive sense). Say that two copies of this agent are placed in a shared environment. These agents will now push the environment towards different states, and are therefore (under the definition I gave above) optimising different utility functions.

Mentioned in

8Alignment Newsletter #15: 07/16/18

Two agents can have the same source code and optimise different utility functions

10th Jul 2018

8Robert Miles

4Linda Linsefors

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:48 AM

[-]Robert Miles6y60

Makes sense. It seems to flow from the fact that the source code is in some sense allowed to use concepts like 'Me' or 'I', which refer to the agent itself. So both agents have source code which says "Maximise the resources that I have control over", but in Agent 1 this translates to the utility function "Maximise the resources that Agent 1 has control over", and in Agent 2 this translates to the different utility function "Maximise the resources that Agent 2 has control over".

So this source code thing that we're tempted to call a 'utility function' isn't actually valid as a mapping from world states to real numbers until the agent is specified, because these 'Me'/'I' terms are undefined.

[-]Linda Linsefors6y40

I agree.

An even simpler example: If the agents are reward learners, both of them will optimize for their own reward signal, which are two different things in the physical world.

Moderation Log