Abram Demski

Abram Demski's Comments

Modeling naturalized decision problems in linear logic

I haven't tried as hard as I could have to understand, so, sorry if this comment is low quality.

But I currently don't see the point of employing linear logic in the way you are doing it.

The appendix suggests that the solution to spurious counterfactuals here is the same as known ideas for resolving that problem. Which seems right to me. So solving spurious counterfactuals isn't the novel aspect here.

But then I'm confused why you focus on 5&10 in the main body of the post, since that's the main point of the 5&10 problem.

Maybe 5&10 is just a super simple example to illustrate things. But then, I don't know what it is you are illustrating. What is linear logic doing for you that you could not do some other way?

I have heard the suggestion that linear logic should possibly be used to aid in the difficulties of logical counterfactuals, before. But (somehow) those suggestions seemed to be doing something more radical. Spurious counterfactuals were supposed to be blocked by something about the restrictive logic. By allowing the chosen action to be used only once (since it gets consumed when used), something nicer is supposed to happen, perhaps avoiding problematic self-referential chains of reasoning.

(As I see it at the moment, linear logic seems to -- if anything -- work against the kind of thing we typically want to achieve. If you can use "my program, when executed, outputs 'one-box'" only once, you can't re-use the result both within Omega's thinking and within the physical choice of box. So linear logic would seem to make it hard to respect logical correlations. Of course this doesn't happen for your proposal here, since you treat the program output as classical.)

But your use (iiuc!) seems less radical. You are kind of just using linear logic as a way to specify a world model. But I don't see what this does for you. What am I missing?

An Orthodox Case Against Utility Functions

Of course, this interpretation requires a fair amount of reading between the lines, since the Jeffrey-Bolker axioms make no explicit mention of any probability distribution, but I don’t see any other reasonable way to interpret them,

Part of the point of the JB axioms is that probability is constructed together with utility in the representation theorem, in contrast to VNM, which constructs utility via the representation theorem, but takes probability as basic.

This makes Savage a better comparison point, since the Savage axioms are more similar to the VNM framework while also trying to construct probability and utility together with one representation theorem.

VNM does not start out with a prior, and allows any probability distribution over outcomes to be compared to any other, and Jeffrey-Bolker only allows comparison of probability distributions obtained by conditioning the prior on an event.

As a representation theorem, this makes VNM weaker and JB stronger: VNM requires stronger assumptions (it requires that the preference structure include information about all these probability-distribution comparisons), where JB only requires preference comparison of events which the agent sees as real possibilities. A similar remark can be made of Savage.

Starting with a prior that gets conditioned on events that correspond to the agent’s actions seems to build in evidential decision theory as an assumption, which makes me suspicious of it.

Right, that's fair. Although: James Joyce, the big CDT advocate, is quite the Jeffrey-Bolker fan! See Why We Still Need the Logic of Decision for his reasons.

I don’t think the motivation for this is quite the same as the motivation for pointless topology, which is designed to mimic classical topology in a way that Jeffrey-Bolker-style decision theory does not mimic VNM-style decision theory. [...] So a similar thing here would be to treat a utility function as a function from some lattice of subsets of (the Borel subsets, for instance) to the lattice of events.

Doesn't pointless topology allow for some distinctions which aren't meaningful in pointful topology, though? (I'm not really very familiar, I'm just going off of something I've heard.)

Isn't the approach you mention pretty close to JB? You're not modeling the VNM/Savage thing of arbitrary gambles; you're just assigning values (and probabilities) to events, like in JB.

Setting aside VNM and Savage and JB, and considering the most common approach in practice -- use the Kolmogorov axioms of probability, and treat utility as a random variable -- it seems like the pointless analogue would be close to what you say.

This can be resolved by defining worlds to be minimal non-zero elements of the completion of the Boolean algebra of events, rather than a minimal non-zero event.

Yeah. The question remains, though: should we think of utility as a function of these minimal elements of the completion? Or not? The computability issue I raise is, to me, suggestive of the negative.

An Orthodox Case Against Utility Functions

First, it seems to me rather clear what macroscopic physics I attach utility to. If I care about people, this means my utility function comes with some model of what a “person” is (that has many free parameters), and if something falls within the parameters of this model then it’s a person,

This does not strike me as the sort of thing which will be easy to write out. But there are other examples. What if humans value something like observer-independent beauty? EG, valuing beautiful things existing regardless of whether anyone observes their beauty. Then it seems pretty unclear what ontological objects it gets predicated on.

Second, what does it mean for a hypothesis to be “individual”? If we have a prior over a family of hypotheses, we can take their convex combination and get a new individual hypothesis. So I’m not sure what sort of “fluidity” you imagine that is not supported by this.

What I have in mind is complicated interactions between different ontologies. Suppose that we have one ontology -- the ontology of classical economics -- in which:

  • Utility is predicated on individuals alone.
  • Individuals always and only value their own hedons; any apparent revealed preference for something else is actually an indication that observing that thing makes the person happy, or that behaving as if they value that other thing makes them happy. (I don't know why this is part of classical economics, but it seems at least highly correlated with classical-econ views.)
  • Aggregate utility (across many individuals) can only be defined by giving an exchange rate, since utility functions of different individuals are incomparable. However, an exchange rate is implicitly determined by the market.

And we have another ontology -- the hippie ontology -- in which:

  • Energy, aka vibrations, is an essential part of social interactions and other things.
  • People and things can have good energy and bad energy.
  • People can be on the same wavelength.
  • Etc.

And suppose what we want to do is try to reconcile the value-content of these two different perspectives. This isn't going to be a mixture between two partial hypotheses. It might actually be closer to an intersection between two partial hypotheses -- since the different hypotheses largely talk about different entities. But that won't be right either. Rather, there is philosophical work to be done, figuring out how to appropriately mix the values which are represented in the two ontologies.

My intuition behind allowing preference structures which are "uncomputable" as functions of fully specified worlds is, in part, that one might continue doing this kind of philosophical work in an unbounded way -- IE there is no reason to assume there's a point at which this philosophical work is finished and you now have something which can be conveniently represented as a function of some specific set of entities. Much like logical induction never finishes and gives you a Bayesian probability function, even if it gets closer over time.

The agent doesn’t have full Knightian uncertainty over all microscopic possibilities. The prior is composed of refinements of an “ontological belief” that has this uncertainty. You can even consider a version of this formalism that is entirely Bayesian (i.e. each refinement has to be maximal),

OK, that makes sense!

but then you lose the ability to retain an “objective” macroscopic reality in which the agent’s point of view is “unspecial”, because if the agent’s beliefs about this reality have no Knightian uncertainty then it’s inconsistent with the agent’s free will (you could “avoid” this problem using an EDT or CDT agent but this would be bad for the usual reasons EDT and CDT are bad, and ofc you need Knightian uncertainty anyway because of non-realizability).

Right.

An Orthodox Case Against Utility Functions

I don't want to make a strong argument against your position here. Your position can be seen as one example of "don't make utility a function of the microscopic".

But let's pretend for a minute that I do want to make a case for my way of thinking about it as opposed to yours.

  • Humans are not clear on what macroscopic physics we attach utility to. It is possible that we can emulate human judgement sufficiently well by learning over macroscopic-utility hypotheses (ie, partial hypotheses in your framework). But perhaps no individual hypothesis will successfully capture the way human value judgements fluidly switch between macroscopic ontologies -- perhaps human reasoning of this kind can only be accurately captured by a dynamic LI-style "trader" who reacts flexibly to an observed situation, rather than a fixed partial hypothesis. In other words, perhaps we need to capture something about how humans reason, rather than any fixed ontology (even of the flexible macroscopic kind).
  • Your way of handling macroscopic ontologies entails knightian uncertainty over the microscopic possibilities. Isn't that going to lack a lot of optimization power? EG, if humans reasoned this way using intuitive physics, we'd be afraid that any science experiment creating weird conditions might destroy the world, and try to minimize chances of those situations being set up, or something along those lines? I'm guessing you have some way to mitigate this, but I don't know how it works.

As for discontinuous utility:

For example, since the utility functions you consider are discontinuous, it is no longer guaranteed an optimal policy exists at all. Personally, I think discontinuous utility functions are strange and poorly motivated.

My main motivating force here is to capture the maximal breadth of what rational (ie coherent, ie non-exploitable) preferences can be, in order to avoid ruling out some human preferences. I have an intuition that this can ultimately help get the right learning-theoretic guarantees as opposed to hurt, but, I have not done anything to validate that intuition yet.

With respect to procrastination-like problems, optimality has to be subjective, since there is no foolproof way to tell when an agent will procrastinate forever. If humans have any preferences like this, then alignment means alignment with human subjective evaluations of this matter -- if the human (or some extrapolated human volition, like HCH) looks at the system's behavior and says "NO!! Push the button now, you fool!!" then the system is misaligned. The value-learning should account for this sort of feedback in order to avoid this. But this does not attempt to minimize loss in an objective sense -- we export that concern to the (extrapolated?) human evaluation which we are bounding loss with respect to.

With respect to the problem of no-optimal-policy, my intuition is that you try for bounded loss instead; so (as with logical induction) you are never perfect but you have some kind of mistake bound. Of course this is more difficult with utility than it is with pure epistemics.

An Orthodox Case Against Utility Functions

What happens if the author/definer of U(E) is wrong about the probabilities? If U(E) is not defined from, nor defined by, the value of its sums, what bad stuff happens if they aren’t equal?

Ultimately, I am advocating a logical-induction like treatment of this kind of thing.

  • Initial values are based on a kind of "prior" -- a distribution of money across traders.
  • Values are initially inconsistent (indeed, they're always somewhat inconsistent), but, become more consistent over time as a result of traders correcting inconsistencies. The traders who are better at this get more money, while the chronically inconsistent traders lose money and eventually don't have influence any more.
  • Evidence of all sorts can come into the system, at any time. The system might suddenly get information about the utility of some hypothetical example, or a logical proposition about utility, whatever. It can be arbitrarily difficult to connect this evidence to practical cases. However, the traders work to reduce inconsistencies throughout the whole system, and therefore, evidence gets propagated more or less as well as it can be.
An Orthodox Case Against Utility Functions

What does it mean for the all-zero universe to be infinite, as opposed to not being infinite? Finite universes have a finite number of bits of information describing them (This doesn’t actually negate the point that uncomputable utility functions exist, merely that utility functions that care whether they are in a mostly-empty vs perfectly empty universe are a weak example.

What it means here is precisely that it is described by an infinite number of bits -- specifically, an infinite number of zeros!

Granted, we could try to reorganize the way we describe the universe so that we have a short code for that world, rather than an infinitely long one. This becomes a fairly subtle issue. I will say a couple of things:

First, it seems to me like the reductionist may want to object to such a reorganization. In the reductive view, it is important that there is a special description of the universe, in which we have isolated the actual basic facts of reality -- things resembling particle position and momentum, or what-have-you.

Second, I challenge you to propose a description language which (a) makes the procrastination example computable, (b) maps all worlds onto a description, and (c) does not create any invalid input tapes.

For example, I can make a modified universe-description in which the first bit is '1' if the button ever gets pressed. The rest of the description remains as before, placing a '1' at time-steps when the button is pressed (but offset by one place, to allow for the extra initial bit). So seeing '0' right away tells me I'm in the button-never-pressed world; it now has a 1-bit description, rather than an infinite-bit description. HOWEVER, this description language includes a description which does not correspond to any world, and is therefore invalid: the string which starts with '1' but then contains only zeros forever.

This issue has a variety of potential replies/implications -- I'm not saying the situation is clear. I didn't get into this kind of thing in the post because it seems like there are just too many things to say about it, with no totally clear path.

An Orthodox Case Against Utility Functions

Perhaps it goes without saying, but obviously, both frameworks are flexible enough to allow for most phenomena -- the question here is what is more natural in one framework or another.

My main argument is that the procrastination paradox is not natural at all in a Savage framework, as it suggests an uncomputable utility function. I think this plausibly outweighs the issue you're pointing at.

But with respect to the issue you are pointing at:

I try to think about what I expect to happen if I take that action (ie the outcome), and I think about how likely that outcome is to have various properties that I care about,

In the Savage framework, an outcome already encodes everything you care about. So the computation which seems to be suggested by Savage is to think of these maximally-specified outcomes, assigning them probability and utility, and then combining those to get expected utility. This seems to be very demanding: it requires imagining these very detailed scenarios.

Alternately, we might say (as as Savage said) that the Savage axioms apply to "small worlds" -- small scenarios which the agent abstracts from its experience, such as the decision of whether to break an egg for an omelette. These can be easily considered by the agent, if it can assign values "from outside the problem" in an appropriate way.

But then, to account for the breadth of human reasoning, it seems to me we also want an account of things like extending a small world when we find that it isn't sufficient, and coherence between different small-world frames for related decisions.

This gives a picture very much like the Jeffrey-Bolker picture, in that we don't really work with outcomes which completely specify everything we care about, but rather, work with a variety of simplified outcomes with coherence requirements between simpler and more complex views.

So overall I think it is better to have some picture where you can break things up in a more tractable way, rather than having full outcomes which you need to pass through to get values.

In the Jeffrey-Bolker framework, you can re-estimate the value of an event by breaking it up into pieces, estimating the value and probability of each piece, and combining them back together. This process could be iterated in a manner similar to dynamic programming in RL, to improve value estimates for actions -- although one needs to settle on a story about where the information originally comes from. I currently like the logical-induction-like picture where you get information coming in "somehow" (a broad variety of feedback is possible, including abstract judgements about utility which are hard to cash out in specific cases) and you try to make everything as coherent as possible in the meanwhile.

An Orthodox Case Against Utility Functions

Yeah, a didactic problem with this post is that when I write everything out, the "reductive utility" position does not sound that tempting.

I still think it's a really easy trap to fall into, though, because before thinking too much the assumption of a computable utility function sounds extremely reasonable.

Suppose I'm running a company, trying to maximize profits. I don't make decisions by looking at the available options, and then estimating how profitable I expect the company to be under each choice. Rather, I reason locally: at a cost of X I can gain Y, I've cached an intuitive valuation of X and Y based on their first-order effects, and I make the choice based on that without reasoning through all the second-, third-, and higher-order effects of the choice. I don't calculate all the way through to an expected utility or anything comparable to it.

With dynamic-programming inspired algorithms such as AlphaGo, "cached an intuitive valuation of X and Y" is modeled as a kind of approximate evaluation which is learned based on feedback -- but feedback requires the ability to compute U() at some point. (So you don't start out knowing how to evaluate uncertain situations, but you do start out knowing how to evaluate utility on completely specified worlds.)

So one might still reasonably assume you need to be able to compute U() despite this.

Two Alternatives to Logical Counterfactuals

OK, all of that made sense to me. I find the direction more plausible than when I first read your post, although it still seems like it'll fall to the problem I sketched.

I both like and hate that it treats logical uncertainty in a radically different way from empirical uncertainty -- like, because we have so far failed to find any way to treat the two uniformly (besides being entirely updateful that is); and hate, because it still feels so wrong for the two to be very different.

Load More