What Decision Theory is Implied By Predictive Processing?

11Abram Demski

4johnswentworth

2Gurkenglas

3Abram Demski

1Steve Byrnes

2Abram Demski

4Steve Byrnes

3johnswentworth

2Steve Byrnes

4Kaj Sotala

3johnswentworth

3Daniel Kokotajlo

New Answer

New Comment

3 Answers sorted by

It's obvious that you intend this as requiring research, including making good conceptual choices, rather than having a fixed answer. However, I'm going to speak from my current understanding of predictive processing.

I'm quite interested in your (John's) take on how the following differs from what you had in mind.

I believe there are several possible answers based on different ways of using predictive-processing-associated ideas.

**A. Soft-max decision-making.**

One thing I've seen in a presentation on this stuff is the claim of a close connection between probability and utility, namely * u=log(p)*.

This relates to a very common approximate model of bounded rationality: you introduce some randomness, but make worse mistakes less probable, by making actions exponentially more probable as their utility goes up. The level of rationality can be controlled by a "temperature" parameter -- higher temperature means more randomness, lower temperature means closer to just always taking the max.

The * u=log(p)* idea takes that "approximation" as

The randomness can be interpreted as exploration. I don't personally see that interpretation as very good, since this form of randomness does not vary based on model uncertainty, but there may be justifications I'm not aware of.

The stronger attempt to justify the randomness, in my book, is based on *monte carlo inference*. However, that's better discussed under the next heading.

**B. Sampling from wishful thinking.**

If you were to construct an agent by the formula from option (A), you would first define the agent's beliefs and desires in the usual Bayesian way. You'd then calculate expected utilities for events in the normal way. You only depart from standard Bayesian decision-making at the last step, where you randomize rather than just taking the best action.

The implicit promise of the * u=log(p)* formula is to provide a deeper unification of belief and value than that, and correspondingly, a deeper restructuring of decision theory.

One commonly discussed proposal is as follows: *condition on success, then sample from the resulting distribution on actions. *(You don't necessarily have a binary notion of "success" if you attach real-valued utilities to the various outcomes, but, there is a generalization where we condition on "utility being high" without exactly specifying how high it is. This will involve the same "temperature" parameter mentioned earlier.)

The technical name for this idea is "planning by inference", because we can use algorithms for Monte Carlo inference to sample actions. We're using inference algorithms to plan! That's a useful unification of utility and probability: machinery previously used for one purpose, is now used for both.

It also kinda captures the intuition you mentioned, about restricting our world-model to assume some stuff we want to be true:

Abstracting out the key idea: we pack all of the complicated stuff into our world-model, hardcode some things into our world-model which we

wantto be true, then generally try to make the model match reality.

However, planning-by-inference can cause us to take some pretty dumb-looking actions.

For example, let's say that we need $200 for rent money. For simplicity, we have binary success/failure: either we get the money we need, or not. We have $25 which we can use to gamble, for a 1/16th chance of making the $200 we need. Alternately, we happen to know tomorrow's winning lotto numbers, which we can enter in for a 100% chance of getting the money we need.

However, taking random actions, let's say there is only a 1/million chance of entering the winning lotto numbers.

Conditioning on our success, it's much more probable that we gamble with our $25 and get the money we need that way.

So planning-by-inference is heavily biased toward plans of action which are *not too improbable in the prior before conditioning on success*.

On the other hand, the temperature parameter can help us out here. Adjusting the temperature looks kind of like "conditioning on success multiple times" -- IE, it's as if you took the new distribution on actions as the prior, and then conditioned again to further bias things in the direction of success.

This has a somewhat nice justification in terms of monte-carlo algorithms. For some algorithms, this "temperature" ends up being an indication of *how long you took to think*. There's a bias toward actions with high prior probabilities because *that's where you look first when planning*, effectively (due to the randomness of the search).

This sounds like a nice account of bounded rationality: the randomness in the * p=log(u)* model is due to the boundedness of our search, and the fact that we may or may not find the good solutions in that time.

Except for one major problem: *this kind of random search isn't what humans, or AIs, do in general.* Even within the realm of Monte Carlo algorithms, there are a lot of optimizations one can add which would destroy the * p=log(u)* relationship. I don't currently know of any reason to suppose that there's some nice generalization which holds for computationally efficient minds.

So ultimately, I would say that there is a *sorta nice* theory of bounded rationality here, but not a *very nice* one.

Except... I actually know a way to address the concern about bias toward *a priori* actions, while sticking to the planning-by-inference picture, and also using an arguably much better theory of bounded rationality.

**C. Logical Induction Decision Theory**

As Scott discussed in a recent talk, if you try the planning-by-inference trick with a *logical inductor* as your inferencer, you maximize expected utility anyway:

This algorithm predicts what it did conditional on having won, and then copies that distribution. It just says, “output whatever I predict that I output conditioned on my having won”.

[...]

But it turns out that you do reach the same endpoint, because the only fixed point of this process is going to do the same as the last algorithm’s. So this algorithm turns out to be functionally the same as the previous one.

One way of understanding what's happening is this: in the planning-by-inference picture, we start with a prior, and condition on success, then sample actions. This creates a bias toward *a priori *probable actions, which can result in the irrational behavior I mentioned earlier.

In the context of logical induction, however, we additionally stipulate that *the a priori distribution on actions and the updated distribution must match.* This has the effect of "updating on success an infinite number of times" (in the sense that I mentioned earlier, where lowering the temperature is kind of like "updating on success again").

Furthermore, unlike the monte-carlo algorithms mentioned earlier, logical induction is a theoretically very well-founded theory of bounded rationality. Not so bounded you'd want to run it on an actual computer, granted. But at least it *addresses the question* of what kind of optimality we can enforce on bounded reasoning, rather than just positing a particular kind of computation as the answer.

Since this is equivalent to regular expected utility maximization with logical inductors, there's no reason to use planning-by-inference, but there's also no reason not to.

So, what kind of decision theory does this get us?

- Cooperate in Prisoner's Dilemma with agents whose pseudorandom moves exactly match, or sufficiently correlate with, our own. Defect against agents with uncorrelated pseudorandom exploration sequences (even if they otherwise have "the same mental architecture"). So cooperation is pretty difficult.
- One-box in Newcomb with a perfect predictor. Two-box if the predictor is imperfect. This holds even if the predictor is extremely accurate (say 99.9% accurate), so long as the agent knows more about its own move than the predictor -- the only way the agent will one-box is if the predictor's prediction contains information about the agent's own action which the agent does not possess at the time of choosing.
- Fail transparent Newcomb.
- Fail counterfactual mugging.
- Fail Parfit's Hitchhiker.
- Fail at agent-simulates-predictor.

This was a solid explanation, thanks.

Some differences from what I imagine...

First and foremost, I imagine that the notion of "success" on which the agent conditions is not just a direct translation of "winning" in the decision problem. After all, a lot of the substance of tricky decision theory problems is exactly in that "direct" translation of what-it-means-to-win! Instead, I imagine that the notion of "success" has a lot more supporting infrastructure built into it, and the agent's actions can directly interact with the supporting infrastructure as well...

I don't buy the lottery example. You never encoded the fact that you know tomorrow's numbers. Shouldn't the prior be that you win a million guranteed if you buy the ticket?

3

No! You also have to enter the right numbers.
What I'm doing is modeling "gamble with the money" as a simple action - you can imaging there's a big red button that gives you $200 1/16th of the time and takes all your money otherwise.
And then I'm modeling "but a lotto ticket" as a compound action consisting of entering each number individually.
"Knowing the numbers" means your world model understands that if you've entered the right numbers, you get the money. But it doesn't make "enter the right numbers" probable in the prior.
Of course the conclusion is reverse if we make "enter the right numbers" into a primitive action.

1

I also didn't understand that. I was thinking of it more like AlphaStar in the sense that your prior is that you're going to continue using your current (probabilistic) policy for all the steps involved in what you're thinking about.
(But not like AlphaStar in that the brain is more likely to do one-or-a-few-steps of rollout with clever hierarchical abstract representations of plans, rather than dozens-of-steps rollouts in a simple one-step-at-a-time way.)

2

See my answer to Gurkenglas.
My understanding of planning by inference (aka active inference?) is not so much like AlphaStar. More to say here, but I'm out of time atm.

My take on predictive processing is a bit different than the textbooks, and in terms of decision theories, it doesn't wind up radically different from logical inductor decision theory, which Scott talked about in 2017 here, and a bit more here. Or at least, take logical inductor decision theory, make everything about it kinda more qualitative, and subtract the beautiful theoretical guarantees etc.

It's obvious but worth saying anyway that pretty much all the decision theory scenarios that people talk about, like Newcomb's problem, are scenarios where people find themselves unsure what to do, and disagree with each other. Therefore the human brain doesn't give straight answers—or if it does, the answers are not to be found at the "base algorithm" level, but rather the "learned model" level (which can involve metacognition). Or I guess it's possible that the base-algorithm-default and the learned models are pushing in different directions.

Scott's 2017 post gives two problems with this decision theory. In my view humans absolutely suffer from both. Like, my friend always buys the more expensive brand of cereal because he's concerned that he wouldn't like the less expensive brand. But he's never tried it! The parallel to the 5-and-10 problem is obvious, right?

The problem about whether to change the map, territory, or both is something I discussed a bit here. Wishful thinking is a key problem—and just looking at the algorithm as I understand it, it's amazing that humans don't have *even more* wishful thinking than we do. I think wishful thinking is kept mostly under control in a couple ways: (1) self-supervised learning effectively gets a veto over what we can imagine happening, by-and-large preventing highly-implausible future scenarios from even entering consideration in the Model Predictive Control competition; (2) The reward-learning part of the algorithm is restricted to the frontal lobe (home of planning and motor action), not the other lobes (home of sensory processing). (Anatomically, the other lobes have no direct connection to the basal ganglia.) This presumably keeps some healthy separation between understanding sensory inputs and "what you want to see". (I didn't mention that in my post because I only learned about it more recently; maybe I should go back and edit, it's a pretty neat trick.) (3) Actually, wishful thinking *is* wildly out of control in certain domains like post hoc rationalizations. (At least, the ground-level algorithm doesn't do anything to keep it under control. At the learned-model level, it can be kept under control by learned metacognive memes, e.g. by Reading The Sequences.).

The embedded agency sequence says somewhere that there are still mysteries in human decisionmaking, but (at some risk of my sounding arrogant) I'm not convinced. Everything people do that I can think of, seems to fit together pretty well into the same algorithmic story. I'm very open to discussion about that. Of course, insofar as human decisionmaking has room for improvement, it's worth continuing to think through these issues. Maybe there's a better option that we can use for our AGIs.

Or if not, I guess we can build our human-brain-like AGIs and tell them to Read The Sequences to install a bunch of metacognitive memes in themselves that patch the various problems in their own cognitive algorithms. :-P (Actually, I wrote that as a joke but maybe it's a viable approach...??)

It's obvious but worth saying anyway that pretty much all the decision theory scenarios that people talk about, like Newcomb's problem, are scenarios where people find themselves unsure what to do, and disagree with each other. Therefore the human brain doesn't give straight answers—or if it does, the answers are not to be found at the "base algorithm" level, but rather the "learned model" level (which can involve metacognition).

One point I personally put a lot of weight on: while people are unsure/disagree about particular scenarios, people do mostly seem...

2

I'm not sure what you're getting at here; you may have a different conception of predictive-processing-like decision theory than I do. I would say "I will get up and go to the store" is a self-consistent model, "I will sit down and read the news" is a self-consistent model, etc. etc. There are always multiple possible self-consistent models—at least one for each possible action that you will take.
Oh, maybe you're taking the perspective where if you're hungry you put a high prior on "I will eat soon". Yeah, I just don't think that's right, or if there's a sensible way to think about it, I haven't managed to get it despite some effort. I think if you're hungry, you want to eat because it leads to a predicted reward, not because you have a prior expectation that you will eat. After all, if you're stuck on a lifeboat in the middle of the ocean, you're hungry but you don't expect to eat. This is an obvious point, frequently brought up, and Friston & colleagues hold strong that it's not a problem for their theory, and I can't make heads or tails of what their counterargument is. I discussed my version (where rewards are also involved) here, and then here I went into more depth for a specific example.

I read you to be asking "what decision theory is implied by predictive processing *as it's implemented in human brains*". It's my understanding that while there are attempts to derive something like a "decision theory formulated entirely in PP terms", there are also serious arguments for the brain actually having systems that are just conventional decision theories and *not* directly derivable from PP.

Let's say you try, as some PP theorists do, to explain all behavior as free energy minimization as opposed to expected utility maximization. Ransom et al. (2020) (current sci-hub) note that this makes it hard to explain cases where the mind acts according to a prediction that has a low probability of being true, but a high cost if it were true.

For example, the sound of rustling grass might be indicative either of the wind or of a lion; if wind is more likely, then predictive processing says that wind should become the predominant prediction. But for your own safety it can be better to predict that it's a lion, just in case. "Predict a lion" is also what standard Bayesian decision theory would recommend, and it seems like the correct solution... but to get that correct solution, you need to import Bayesian decision theory as an extra ingredient, it doesn't fall naturally out of the predictive processing framework.

That sounds to me like PP, or at least PP as it exists, is something that's compatible with implementing different decision theories, rather than one that implies a specific decision theory by itself.

That sounds to me like PP, or at least PP as it exists, is something that's compatible with implementing different decision theories, rather than one that implies a specific decision theory by itself.

I generally agree with this. Specifically, I tend to imagine that PP is trying to make our behavior match a model in which we behave like an agent (at least sometimes). Thus, for instance, the tendency for humans to do things which "look like" or "feel like" optimizing for X without actually optimizing for X.

In that case, PP would be consistent with many decision theories, depending on the decision theory used by the model it's trying to match.

Academic philosophers sometimes talk about how beliefs have a mind-to-world direction of fit whereas desires have a world-to-mind direction of fit. Perhaps they even define the distinction that way, I don't remember.

A quick google search didn't turn up anything interesting but I think there might be some interesting papers in there if you actually looked. Not sure though.

Similarly, in decision theory literature there is this claim that "deliberation screens off prediction." That seems relevant somehow. If it's true it might be true for reasons unrelated to predictive processing, but I suspect there is a connection...

At a fairly abstract/stylized level, predictive processing models human cognition and behavior as always minimizing predictive error. Sometimes, the environment is "fixed" and our internal models are updated to match it - e.g. when I see my untied shoelace, my internal model updates to include an untied shoelace. Other times, our internal model is "fixed", and we act on the environment to make it better match the model - e.g. "wanting food" is internally implemented as a strong expectation that I'm going to eat soon, which in turn makes me seek out food in order to make that expectation true. Rather than having a utility function that values food or anything like that, the decision theory implied by predictive processing just has a model in which we obtain food, and we try to make the model match reality.

Abstracting out the key idea: we pack all of the complicated stuff into our world-model, hardcode some things into our world-model which we

wantto be true, then generally try to make the model match reality.While making the model match reality, there will be knobs we can turn both "in the model" (i.e. updates) and "in reality" (i.e. actions); there's no hard separation between the two. There will be things in both map and reality which we can change, and there will be things in both map and reality which we can't change. It's all treated the same. At first glance, that looks potentially quite useful for embedded agency.

(My own interest in this was piqued partly because a predictive-processing-like decision theory seems likely to produce abstraction boundaries which look like Cartesian boundaries. As in that post, it seems like some of the intuitive arguments we make around decision theories would naturally drop out of a predictive-processing-like decision theory.)

What problems does such a decision theory run into? What sort of things can we hardcode into our world-model without breaking it altogether? What things must be treated as "fixed" when making the model match reality? Does such an approach have any "invariant" implications, i.e. implications independent of

whichmodel we're trying to match? What further requirements are there on the target model in order for a predictive-processing-style agent to have "good" behavior, in the ways characterized by other decision theories?This is intended to be an open-ended research question, but off-the-cuff thoughts and links to relevant work are welcome.