10 Strong implication of preference uncertainty

12th Aug 2020

2 min read

10

Here is a theory that is just as good as general relativity:

AGR (Angel General Relativity): Tiny invisible angels push around all the particles in the universe in a way that is indistinguishable from the equations of general relativity.

This theory is falsifiable, just as general relativity (GR) itself is. Indeed, since it gives exactly the same predictions as GR, a Bayesian will never find evidence that prefers it over Einstein's theory.

Therefore, I obviously deserve a Nobel prize for suggesting it.

Enter Occam's shaving equipment

Obviously the angel theory is not a revolutionary new theory. Partially because I've not done any of the hard work, just constructed a pointer to Einstein's theory. But, philosophically, the main justification is Occam's razor - the simplest theory is to be preferred.

From a Bayesian perspective, you could see violations of Occam's razor as cheating, using your posterior as priors. There is a whole class of "angels are pushing particles" theories, and AGR is just a small portion of that space. By considering AGR and GR on equal footing, we're privileging AGR above what it deserves^[1].

In physics, Occam's razor doesn't matter for strictly identical theories

Occam's razor has two roles: the first is to distinguish between strictly identical theories; the second is to distinguish between theories that give the same prediction on the data so far, but may differ in the future.

Here, we focus on the first case: GR and AGR are strictly identical; no data will ever distinguish them. In essence, the theory that one is right and the other wrong is not falsifiable.

What that means is that, though AGR may be a priori less likely than GR, the relative probability between the two theories will never change: they make the same predictions. And also because they make the same predictions, that relative probability is irrelevant in practice: we could use AGR just as well as GR for predictions.

How preferences differ

Now let's turn to preferences, as described in our paper "Occam's razor is insufficient to infer the preferences of irrational agents".

Here two sets of preferences are "prediction-identical", in the sense of the physics theories above, if they predict the same behaviour for the agent. So that means that two different preference-based explanations for the same behaviour will never change their relative probabilities.

Worse than that, Occam's razor doesn't solve the issue. The simplest explanations of, say, human behaviour, is that humans are fully rational at all times. This isn't the explanation that we want.

Even worse than that, prediction-identical preferences will lead to vastly different consequences if program an AI to maximise them.

So, in summary:

Prediction-identical preferences never change relative probability.
The simplest prediction-identical preferences are known to be wrong for humans.
It could be very important for the future to get the right preference for humans.

GR would make up a larger portion of $G$ , "geometric theories of space-time" than AGR makes up of $A$ , and $G$ would be more likely than $A$ anyway, especially after updating on the non-observation of angels. ↩︎

AI

Frontpage

Mentioned in

12Learning human preferences: black-box, white-box, and structured white-box access

New Comment

1 comment, sorted by

top scoring

Click to highlight new comments since: Today at 3:46 AM

[-]Donald Hobson5y20

And also because they make the same predictions, that relative probability is irrelevant in practice: we could use AGR just as well as GR for predictions.

There is a subtle sense in which the difference between AGR and GR is relevant. While the difference doesn't change the predictions, it may change the utility function. An agent that cares about angels (if they exist) might do different things if it believes itself to be in AGR world than in GR world. As the theories make identical predictions, the agents belief only depends on its priors (and any irrationality), not on which world it is in. Nonetheless, this means that the agent will pay to avoid having its priors modified. Even though the modification doesn't change the agents predictions in the slightest.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

10

Strong implication of preference uncertainty

10

Enter Occam's shaving equipment

In physics, Occam's razor doesn't matter for strictly identical theories

How preferences differ