Geometric UDT

Oh cool!

We could call the non-nosy hypotheses "nice neighbors".

Seems like a bad name: "nice neighbors" don't care if everyone 'around' them is being tortured.

I've framed things in this post in terms of value uncertainty, but I believe everything can be re-framed in terms of uncertainty about what the correct prior is (which connects better with the motivation in my previous post on the subject).

Wait, do you think value uncertainty is equivalent/reducible to uncertainty about the correct prior? Would that mean the correct prior to use depends on your values?

One issue with Geometric UDT is that it doesn't do very well in the presence of some utility hypotheses which are exactly or approximately negative of others: even if there is a Pareto-improvement, the presence of such enemies prevents us from maximizing the product of gains-from-trade, so Geometric UDT is indifferent between such improvements and the BATNA. This can probably be improved upon.

So one conflicting pair spoils the whole thing, i.e. ignoring the pair is a pareto improvement?

^{^}

In this essay, UDT means UDT 1.1.

^{^}

I'm not claiming there's a totally watertight argument for this conclusion given this premise; I'm only claiming that if you believe something like this you should probably care about what I'm doing here.

^{^}

Even a simple strategy like "trust whatever the humans tell you about what they want" counts as value learning in this sense; the important thing is that the AI system doesn't start out totally confident about what humans want, and it observes things that let it learn.

^{^}

This isn't a strictly true mathematical assertion; in reality, we need to make more assumptions in order to prove such a theorem (eg, I'm not defining what 'optimality' means here). The point is more that this is the sort of thing someone who is convinced of UDT is inclined to believe (they'll tend to be comfortable with making the appropriate additional assumptions such that this is true).

^{^}

This phenomenon was discovered and named by Diffractor (private communication).

^{^}

In logic, it is more typical to understand a world as a truth-valuation like this, so that worlds are functions from propositions to {true, false}. In probability, it is more typical to reverse things, treating a proposition (aka "event") as a set of worlds, so that given a world, you can check if it is in the set (so a proposition can be thought of as a function from worlds to {true, false}).

This distinction doesn't matter very much, at least not for our purposes here.

^{^}

This particular utility function will not be well-defined if there are infinitely many locations, since the sum could fail to converge. There are many possible solutions to this problem, but the discussion goes beyond the present topic.

^{^}

We can also get a nosy-neighbor effect without putting terminal utility on other worlds, if we believe that what happens in other worlds impacts our world. For example, maybe in puppy-world, a powerful being named Omega simulates what happens in rainbow-world, and creates or destroys some puppies accordingly. Care about what happens in other worlds is induced indirectly through beliefs.

	Rainbow World	Puppy World
Maximize Rainbows	+100	0
Maximize Puppies	0	+90

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

13

13

Motivating Problem

Puppies vs Rainbows v1

Puppies vs Rainbows v2

Geometric UDT