Abram Demski


Pointing at Normativity
Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Embedded Agency


An intriguing perspective, but I'm not sure whether I agree. Naively, it would seem that a choice between fixed points in the FixDT setting is just a choice between different probability distributions, which brings us very close to the VNM idea of a choice between gambles. So VNM-like utility theory seems like the obvious outcome.

That being said, I don't really agree with the idea that an agent should have a fixed VNM-like utility function. So I do think some generalization is needed.

Yeah, "settles on" here meant however the agent selects beliefs. The epistemic constraint implies that the agent uses exhaustive search or some other procedure guaranteed to produce a fixed point, rather than Banach-style iteration. 

Moving to a Banach-like setting will often make the fixed points unique, which takes away the whole idea of FixDT.

Moving to a setting where the agent isn't guaranteed to converge would mean we have to re-write the epistemic constraint to be appropriate to that setting.

Yes, thanks for citing it here! I should have mentioned it, really.

I see the Skyrms iterative idea as quite different from the "just take a fixed point" theory I sketch here, although clearly they have something in common. FixDT makes it easier to combine both epistemic and instrumental concerns -- every fixed point obeys the epistemic requirement; and then the choice between them obeys the instrumental requirement. If we iteratively zoom in on a fixed point instead of selecting from the set, this seems harder?

If we try the Skyrms iteration thing, maybe the most sensible thing would be to move toward the beliefs of greatest expected utility -- but do so in a setting where epistemic utility emerges naturally from pragmatic concerts (such as A Pragmatists Guide to Epistemic Decision Theory by Ben Levinstein). So the agent is only ever revising its beliefs in pragmatic ways, but we assume enough about the environment that it wants to obey both the epistemic and instrumental constraints? But, possibly, this assumption would just be inconsistent with the sort of decision problem which motivates FixDT (and Greaves).

I find your attempted clarification confusing. 

Our model is going to have some variables in it, and if we don't know in advance where the agent will be at each timestep, then presumably we don't know which of those variables (or which function of those variables, etc) will be our Markov blanket. 

No? A probabilistic model can just be a probability distribution over events, with no "random variables in it". It seemed like your suggestion was to define the random variables later, "on top of" the probabilistic model, not as an intrinsic part of the model, so as to avoid the objection that a physics-ish model won't have agent-ish variables in it.

So the random variables for our markov blanket can just be defined as things like skin surface temperature & surface lighting & so on; random variables which can be derived from a physics-ish event space, but not by any particularly simple means (since the location of these things keeps changing).

On the other hand, if we knew which variables or which function of the variables were the blanket, then presumably we'd already know where the agent is, so presumably we're already conditioning on something when we say "the agent's boundary is a Markov blanket".

Again, no? If I know skin surface temperature and lighting conditions and so on all add up to a Markov blanket, I don't thereby know where the skin is.

I think that is a basically-correct argument. It doesn't actually argue that agent boundaries aren't Markov boundaries; I still think agent boundaries are basically Markov boundaries. But the argument implies that the most naive setup is missing some piece having to do with "where the agent is".

It seems like you agree with Sam way more than would naively be suggested by your initial reply. I don't understand why. 

When I talked with Sam about this recently, he was somewhat satisfied by your reply, but he did think there were a bunch of questions which follow. By giving up on the idea that the markov blanket can be "built up" from an underlying causal model, we potentially give up on a lot of niceness desiderata which we might have wanted. So there's a natural question of how much you want to try and recover, which you could have gotten from "structural" markov blankets, and might be able to get some other way, but don't automatically get from arbitrary markov blankets.

In particular, if I had to guess: causal properties? I don't know about you, but my OP was mainly directed at Critch, and iiuc Critch wants the Markov blanket to have some causal properties so that we can talk about input/output. I also find it appealing for "agent boundaries" to have some property like that. But if the random variables are unrelated to a causal graph (which, again, is how I understood your proposal) then it seems difficult to recover anything like that.

Okay, so you know how AI today isn't great at certain... let's say "long-horizon" tasks? Like novel large-scale engineering projects, or writing a long book series with lots of foreshadowing?

(Modulo the fact that it can play chess pretty well, which is longer-horizon than some things; this distinction is quantitative rather than qualitative and it’s being eroded, etc.)

And you know how the AI doesn't seem to have all that much "want"- or "desire"-like behavior?

(Modulo, e.g., the fact that it can play chess pretty well, which indicates a certain type of want-like behavior in the behaviorist sense. An AI's ability to win no matter how you move is the same as its ability to reliably steer the game-board into states where you're check-mated, as though it had an internal check-mating “goal” it were trying to achieve. This is again a quantitative gap that’s being eroded.)

I don't think the following is all that relevant to the point you are making in this post, but someone cited this post of yours in relation to the question of whether LLMs are "intelligent" (summarizing the post as "Nate says LLMs aren't intelligent") and then argued against the post as goalpost-moving, so I wanted to discuss that.

It may come as a shock to some, that Abram Demski adamantly defends the following position: GPT4 is AGI. I would be goalpost-moving if I said otherwise. I think the AGI community is goalpost-moving to the extent that it says otherwise. 

I think there is some tendency in the AI Risk community to equate "AGI" with "the sort of AI which kills all the humans unless it is aligned". But "AGI" stands for "artificial general intelligence", not "kills all the humans". I think it makes more sense for the definition of AGI to be up to the community of AI researchers who use the term AGI to distance their work from narrow AI, rather than for it to be up to the AI risk community. And GPT4 is definitely not narrow AI.

I'll argue an even stronger claim: if you come up with a task which can be described and completed entirely in text format (and then evaluated somehow for performance quality), for most such tasks the performance of GPT4 is at or above the performance of a random human. (We can even be nice and only randomly sample humans who speak whichever languages are appropriate to the task; I'll still stand by the claim.) Yes, GPT4 has some weaknesses compared to a random human. But most claims of weaknesses I've heard are in fact contrasting GPT4 to expert humans, not random humans. So my stronger claim is: GPT4 is human-level AGI, maybe not by all possible definitions of the term, but by a very reasonable-seeming definition which 2014 Abram Demski might have been perfectly happy with. To deny this would be goalpost-moving for me; and, I expect, for many.

So (and I don't think this is what you were saying) if GPT4 were being ruled out of "human-level AGI" because it cannot write a coherent set of novels on its own, or do a big engineering project, well, I call shenanigans. Most humans can't do that either.

I'm looking at the Savage theory from your own https://plato.stanford.edu/entries/decision-theory/ and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O.

(Just to be clear, I did not write that article.)

I think the interpretation of Savage is pretty subtle. The objects of preference ("outcomes") and objects of belief ("states") are treated as distinct sets. But how are we supposed to think about this?

  • The interpretation Savage seems to imply is that both outcomes and states are "part of the world", but the agent has somehow segregated parts of the world into matters of belief and matters of preference. But however the agent has done this, it seems to be fundamentally beyond the Savage representation; clearly within Savage, the agent cannot represent meta-beliefs about which matters are matters of belief and which are matters of preference. So this seems pretty weird. 
  • We could instead think of the objects of preference as something like "happiness levels" rather than events in the world. The idea of the representation theorem then becomes that we can peg "happiness levels" to real numbers. In this case, the picture looks more like standard utility functions; S is the domain of the function that gives us our happiness level (which can be represented by a real-valued utility). 
  • Another approach which seems somewhat common is to take the Savage representation but require that S=O. Savage's "acts" then become maps from world to world, which fits well with other theories of counterfactuals and causal interventions. 

So even within a Savage framework, it's not entirely clear that we would want the domain of the utility function to be different from the domain of the belief function.

I should also have mentioned the super-common VNM picture, where utility has to be a function of arbitrary states as well.

That's just math speak, you can define a lot of things as a lot of other things, but that doesn't mean that the agent is going to be literally iterating over infinite sets of infinite bit strings and evaluating something on each of them.

The question is, what math-speak is the best representation of the things we actually care about? 

It remains totally unclear to me why you demand the world to be such a thing.

Ah, if you don't see 'worlds' as meaning any such thing, then I wonder, are we really arguing about anything at all?

I'm using 'worlds' that way in reference to the same general setup which we see in propositions-vs-models in model theory, or in  vs the -algebra in the Kolmogorov axioms, or in Kripke frames, and perhaps some other places. 

We can either start with a basic set of "worlds" (eg, ) and define our "propositions" or "events" as sets of worlds, where that proposition/event 'holds' or 'is true' or 'occurs'; or, equivalently, we could start with an algebra of propositions/events (like a -algebra) and derive worlds as maximally specific choices of which propositions are true and false (or which events hold/occur).

My point is that if U has two output values, then it only needs two possible inputs. Maybe you're saying that if |dom(U)|=2, then there is no point in having |dom(P)|>2, and maybe you're right, but I feel no need to make such claims.

Maybe I should just let you tell me what framework you are even using in the first place. There are two main alternatives to the Jeffrey-Bolker framework which I have in mind: the Savage axioms, and also the thing commonly seen in statistics textbooks where you have a probability distribution which obeys the Kolmogorov axioms and then you have random variables over that (random variables being defined as functions of type ). A utility function is then treated as a random variable.

It doesn't sound like your notion of utility function is any of those things, so I just don't know what kind of framework you have in mind.

My point is only that U is also reasonable, and possibly equivalent or more general. That there is no "case against" it. 

I do agree that my post didn't do a very good job of delivering a case against utility functions, and actually only argues that there exists a plausibly-more-useful alternative to a specific view which includes utility functions as one of several elements

Utility functions definitely aren't more general.

A classical probability distribution over  with a utility function understood as a random variable can easily be converted to the Jeffrey-Bolker framework, by taking the JB algebra as the sigma-algebra, and V as the expected value of U. Technically the sigma-algebra needs to be atomless to fit JB exactly, but Zoltan Domotor (Axiomatization of Jeffrey Utilities) generalizes this considerably.

I've heard people say that there is a way to convert in the other direction, but that it requires ultrafilters (so in some sense it's very non-constructive). I haven't been able to find this construction yet or had anyone explain how it works.

So it seems to me, but I recognize that I haven't shown in detail, that the space of computable values is strictly broader in the JB framework; computable utility functions + computable probability gives us computable JB-values, but computable JB-values need not correspond to computable utility functions.

Thus, the space of minds which can be described by the two frameworks might be equivalent, but the space of minds which can be described by computations does not seem to be; the JB space, there, is larger.

I don't see why any "good" utility function should be uncomputable.

Well, the Jeffrey-Bolker kind of explanation is as follows: agents really only need to consider and manipulate the probabilities and expected values of events (ie, propositions in the agent's internal language). So it makes some sense to assume that these probabilities and expected values are computable. But this does not imply (as far as I know) that we can construct 'worlds' as maximal specifications of which propositions are true/false and then define a utility function on those worlds which is consistent with the computable expected values and have that utility function itself be computable. And indeed it seems rather plausible to me that this is not the case, even for values which otherwise seem relatively unremarkable, as illustrated by examples like the procrastination paradox.

I think there is a good reason to imagine that the agent structures its ontology around its perceptions. The agent cannot observe whether-the-button-is-ever-pressed; it can only observe, on a given day, whether the button has been pressed on that day. |Omega|=2 is too small to even represent such perceptions.

I agree with the first sentence, however Omega is merely the domain of U, it does not need to be the entire ontology. In this case Omega={"button has been pressed", "button has not been pressed"} and P("button has been pressed" | "I'm pressing the button")~1. Obviously, there is also no problem with extending Omega with the perceptions, all the way up to |Omega|=4, or with adding some clocks.

I'm not sure why you say Omega can be the domain of U but not the entire ontology. This seems to mean that we don't know how to take expected values for arbitrary events. Also it means you are no longer advocating for the model I'm arguing against, where U is a random variable.

We could expand the scenario so that every "day" is represented by an n-bit string.

If you want to force the agent to remember the entire history of the world, then you'll run out of storage space before you need to worry about computability. A real agent would have to start forgetting days, or keep some compressed summary of that history. It seems to me that Jeffrey would "update" the daily utilities into total expected utility; in that case, U can do something similar.

I agree that we can put even more stringent (and realistic) requirements on the computational power of the agent, and then both JB and random-variable treatments become implausible, in so far as those treatments involve infinitely large representations.

I still think that the Jeffreyesque representational choice of using compact event-propositions, rather than fully-specified worlds, seems more plausible with respect to such bounded agents.

You defined U at the very beginning, so there is no need to send these new facts to U, it doesn't care. Instead, you are describing a problem with P, and it's a hard problem, but Jeffrey also uses P, so that doesn't solve it.

As per my earlier comment on "Omega is merely the domain of U", I think here you're abandoning elements of the random-variable approach to U, and in fact reasoning in a more JB-esque way.

>  ... set our model to be a list of "events" we've observed ...
I didn't understand this part.

If you "evaluate events", then events have some sort of bit representation in the agent, right? I don't clearly see the events in your "Updates Are Computable" example, so I can't say much and I may be confused, but I have a strong feeling that you could define U as a function on those bits, and get the same agent.

Yeah, it seems like we're talking past each other here and would need to do more work to unpack what's going on. All I can think to say right now is this: the usual random-variable approach to defining U requires that probabilities respect countable additivity, because the event of "the button being pressed" is just the set of individual worlds where that happens (where the button gets pressed on a particular day). This is the root of the computational difficulty in the standard approach. JB doesn't require countable additivity, since it isn't a rule which agents can enforce on their beliefs by touching only finitely many of them. This harkens back to something you said earlier:

Instead, you are describing a problem with P, and it's a hard problem, but Jeffrey also uses P, so that doesn't solve it.

Which I agree with in this case, except that JB does "solve" it by explicitly relaxing that constraint.

Again, this is a way in which JB is more general, not less; JB could follow that constraint, if you like.

I agree that it makes more sense to suppose "worlds" are something closer to how the agent imagines worlds, rather than quarks. But on this view, I think it makes a lot of sense to argue that there are no maximally specific worlds -- I can always "extend" a world with an extra, new fact which I had not previously included. IE, agents never "finish" imagining worlds; more detail can always be added (even if only in separate magisteria, eg, imagining adding epiphenomenal facts). I can always conceive of the possibility of a new predicate beyond all the predicates which a specific world-model discusses.

If you buy this, then I think the Jeffrey-Bolker setup is a reasonable formalization.

If you don't buy this, my next question would be whether you really think that the sort of "world" ("world model", as you called it) which an agent attaches value to always are "closed off" (ie sperify all the facts one way or the other; do not admit further detail) -- or, perhaps, you merely want to argue that this can sometimes be the case but not always. (Because if it's sometimes the case but not always, this argues against both the traditional view where Omega is the set which the probability is a measure over & the utility function is a function of, and against the Jeffrey-Bolker picture.)

I find it implausible that the sort of "world model" which we can model humans as having-values-as-a-function-of is "closed off" -- we can appreciate ideas like atoms and quarks, adding these to our ontology, without necessarily changing other aspects of our world-model. Perhaps sometimes we can "close things off" like this -- we can consider the possibility that there "is nothing else" -- but even so, I think this is better-modeled as an additional assertion which we add to the set of propositions defining a possibility rather than modeling us as having bottomed out in an underlying set of "world" which inherently decide all propositions.

In "procrastination" example you intentionally picked a bad model, so it proves nothing (if the world only has one button we care about, then maybe |Omega|=2 and everything is perfectly computable).

You seem to be suggesting that any such example could be similarly re-written to make things nicely computable. I find this implausible. We could expand the scenario so that every "day" is represented by an n-bit string. The computable function b() looks at a "day" and tells us whether the button was pressed or not on that day. As before, we get -10 utility if the button is never pressed. But we also have some (computable) reward, r(), which is a function of a "day" and tells us how good or bad that day was. The discounted reward is such that these priorities are never more important than whether or not the button is pressed; but so long as the button is eventually pressed, we prefer to get more reward rather than less. How would you change the representation now?

More generally, do you believe that any plausible utility function on bit-strings can be re-represented as a computable function (perhaps on some other representation, rather than bit-strings)? Why would you particularly expect this to be the case?

I think in arguing that I intentionally picked a bad model, you mean that the world-model representation which I chose was totally ad-hoc and chosen specifically to make things difficult to compute, and without having the goal in mind of making things difficult to compute, someone else would have chosen something simpler like |Omega|=2. But I think there is a good reason to imagine that the agent structures its ontology around its perceptions. The agent cannot observe whether-the-button-is-ever-pressed; it can only observe, on a given day, whether the button has been pressed on that day. |Omega|=2 is too small to even represent such perceptions.

Further on, it seems to me that if we set our model to be a list of "events" we've observed, then we get the exact thing you're talking about. Although you're imprecise and inconsistent about what an event is, how it's represented, how many there are, so I'm not sure if that's supposed to make anything more tractable.

I didn't understand this part.

In general, asking questions about the domain of U (and P!) is a good idea, and something that all introductions to Utility lack. But the ease with which you abandon a perfectly good formalism is concerning. LI is cool, and it doesn't use U, but that's not an argument against U, at best you can say that U was not as useful as you'd hoped.

Jeffrey-Bolker is fairly commonly advocated amongst decision theorists in philosophy (from both sides of the CDT-EDT debate!), although as far as I'm aware it hasn't made its way into stats textbooks at any level. It can be seen as part of a broader movement in mathematics, away from set-theoretic representations and toward more algebraic representations. A related example is pointless topology -- instead of understanding a topology as a structure imposed on a set of points, the structure of "opens" (no longer "open sets") is examined in its own right. In the same way that discarding "worlds" moves the formalism closer to concepts which the agent can actually realistically manipulate, discarding "points" from topology moves the math closer to the pieces which mathematicians are actually interested in manipulating.

My own take is that the domain of U is the type of P. That is, U is evaluated on possible functions P. P certainly represents everything the agent cares about in the world, and it's also already small and efficient enough to be stored and updated in the agent, so this solution creates no new problems. 

This is an interesting alternative, which I have never seen spelled out in axiomatic foundations.

Load More