zulupineapple - AI Alignment Forum

An Orthodox Case Against Utility Functions

Maybe I should just let you tell me what framework you are even using in the first place.

I'm looking at the Savage theory from your own https://plato.stanford.edu/entries/decision-theory/ and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O.

Furthermore, if O={o0,o1}, then I can group the terms into u(o0)P("we're in a state where f evaluates to o0") + u(o1)P("we're in a state where f evaluates to o1"), I'm just moving all of the complexity out of EU and into P, which I assume to work by some magic (e.g. LI), that doesn't involve literally iterating over every possible S.

We can either start with a basic set of "worlds" (eg, ) and define our "propositions" or "events" as sets of worlds <...>

That's just math speak, you can define a lot of things as a lot of other things, but that doesn't mean that the agent is going to be literally iterating over infinite sets of infinite bit strings and evaluating something on each of them.

By the way, I might not see any more replies to this.

An Orthodox Case Against Utility Functions

zulupineapple6mo00

A classical probability distribution over with a utility function understood as a random variable can easily be converted to the Jeffrey-Bolker framework, by taking the JB algebra as the sigma-algebra, and V as the expected value of U.

Ok, you're saying that JB is just a set of axioms, and U already satisfies those axioms. And in this construction "event" really is a subset of Omega, and "updates" are just updates of P, right? Then of course U is not more general, I had the impression that JB is a more distinct and specific thing.

Regarding the other direction, my sense is that you will have a very hard time writing down these updates, and when it works, the code will look a lot like one with an utility function. But, again, the example in "Updates Are Computable" isn't detailed enough for me to argue anything. Although now that I look at it, it does look a lot like the U(p)=1-p("never press the button").

events (ie, propositions in the agent's internal language)

I think you should include this explanation of events in the post.

construct 'worlds' as maximal specifications of which propositions are true/false

It remains totally unclear to me why you demand the world to be such a thing.

I'm not sure why you say Omega can be the domain of U but not the entire ontology.

My point is that if U has two output values, then it only needs two possible inputs. Maybe you're saying that if |dom(U)|=2, then there is no point in having |dom(P)|>2, and maybe you're right, but I feel no need to make such claims. Even if the domains are different, they are not unrelated, Omega is still in some way contained in the ontology.

I agree that we can put even more stringent (and realistic) requirements on the computational power of the agent

We could and I think we should. I have no idea why we're talking math, and not writing code for some toy agents in some toy simulation. Math has a tendency to sweep all kinds of infinite and intractable problems under the rug.

An Orthodox Case Against Utility Functions

zulupineapple6mo00

Answering out of order:

<...> then I think the Jeffrey-Bolker setup is a reasonable formalization.

Jeffrey is a reasonable formalization, it was never my point to say that it isn't. My point is only that U is also reasonable, and possibly equivalent or more general. That there is no "case against" it. Although, if you find Jeffery more elegant or comfortable, there is nothing wrong with that.

do you believe that any plausible utility function on bit-strings can be re-represented as a computable function (perhaps on some other representation, rather than bit-strings)?

I don't know what "plausible" means, but no, that sounds like a very high bar. I believe that if there is at least one U that produces an intelligent agent, then utility functions are interesting and worth considering. Of course I believe that there are many such "good" functions, but I would not claim that I can describe the set of all of them. At the same time, I don't see why any "good" utility function should be uncomputable.

I think there is a good reason to imagine that the agent structures its ontology around its perceptions. The agent cannot observe whether-the-button-is-ever-pressed; it can only observe, on a given day, whether the button has been pressed on that day. |Omega|=2 is too small to even represent such perceptions.

I agree with the first sentence, however Omega is merely the domain of U, it does not need to be the entire ontology. In this case Omega={"button has been pressed", "button has not been pressed"} and P("button has been pressed" | "I'm pressing the button")~1. Obviously, there is also no problem with extending Omega with the perceptions, all the way up to |Omega|=4, or with adding some clocks.

We could expand the scenario so that every "day" is represented by an n-bit string.

If you want to force the agent to remember the entire history of the world, then you'll run out of storage space before you need to worry about computability. A real agent would have to start forgetting days, or keep some compressed summary of that history. It seems to me that Jeffrey would "update" the daily utilities into total expected utility; in that case, U can do something similar.

I can always "extend" a world with an extra, new fact which I had not previously included. IE, agents never "finish" imagining worlds; more detail can always be added

You defined U at the very beginning, so there is no need to send these new facts to U, it doesn't care. Instead, you are describing a problem with P, and it's a hard problem, but Jeffrey also uses P, so that doesn't solve it.

> ... set our model to be a list of "events" we've observed ...
I didn't understand this part.

If you "evaluate events", then events have some sort of bit representation in the agent, right? I don't clearly see the events in your "Updates Are Computable" example, so I can't say much and I may be confused, but I have a strong feeling that you could define U as a function on those bits, and get the same agent.

This is an interesting alternative, which I have never seen spelled out in axiomatic foundations.

The point would be to set U(p) = p("button has been pressed") and then decide to "press the button" by evaluating U(P conditioned on "I'm pressing the button") * P("I'm pressing the button" | "press the button"), where P is the agent's current belief, and p is a variable of the same type as P.

An Orthodox Case Against Utility Functions

zulupineapple6mo70

I certainly don't evaluate my U on quarks. Omega is not the set of worlds, it is the set of world models, and we are the ones who decide what that model should be. In "procrastination" example you intentionally picked a bad model, so it proves nothing (if the world only has one button we care about, then maybe |Omega|=2 and everything is perfectly computable).

Further on, it seems to me that if we set our model to be a list of "events" we've observed, then we get the exact thing you're talking about. Although you're imprecise and inconsistent about what an event is, how it's represented, how many there are, so I'm not sure if that's supposed to make anything more tractable.

In general, asking questions about the domain of U (and P!) is a good idea, and something that all introductions to Utility lack. But the ease with which you abandon a perfectly good formalism is concerning. LI is cool, and it doesn't use U, but that's not an argument against U, at best you can say that U was not as useful as you'd hoped.

My own take is that the domain of U is the type of P. That is, U is evaluated on possible functions P. P certainly represents everything the agent cares about in the world, and it's also already small and efficient enough to be stored and updated in the agent, so this solution creates no new problems.

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments