(This originated as a comment on the post "Embedded World-Models," but it makes a broadly applicable point and is substantial enough to be a post, so I thought I'd make it a post as well.)


This post feels quite similar to things I have written in the past to justify my lack of enthusiasm about idealizations like AIXI and logically-omniscient Bayes. But I would go further: I think that grappling with embeddedness properly will inevitably make theories of this general type irrelevant or useless, so that "a theory like this, except for embedded agents" is not a thing that we can reasonably want. To specify what I mean, I'll use this paragraph as a jumping-off point:

Embedded agents don’t have the luxury of stepping outside of the universe to think about how to think. What we would like would be a theory of rational belief for situated agents which provides foundations that are similarly as strong as the foundations Bayesianism provides for dualistic agents.

Most "theories of rational belief" I have encountered -- including Bayesianism in the sense I think is meant here -- are framed at the level of an evaluator outside the universe, and have essentially no content when we try to transfer them to individual embedded agents. This is because these theories tend to be derived in the following way:

  • We want a theory of the best possible behavior for agents.
  • We have some class of "practically achievable" strategies , which can actually be implemented by agents. We note that an agent's observations provide some information about the quality of different strategies . So if it were possible to follow a rule like "find the best given your observations, and then follow that ," this rule would spit out very good agent behavior.
  • Usually we soften this to a performance-weighted average rather than a hard argmax, but the principle is the same: if we could search over all of , the rule that says "do the search and then follow what it says" can be competitive with the very best . (Trivially so, since it has access to the best strategies, along with all the others.)
  • But usually . That is, the strategy "search over all practical strategies and follow the best ones" is not a practical strategy. But we argue that this is fine, since we are constructing a theory of ideal behavior. It doesn't have to be practically implementable.

For example, in Solomonoff, is defined by computability while is allowed to be uncomputable. In the LIA construction, is defined by polytime complexity while is allowed to run slower than polytime. In logically-omniscient Bayes, finite sets of hypotheses can be manipulated in a finite universe but the full Boolean algebra over hypotheses generally cannot (N.B. I don't think this last case fits my schema quite as well as the other two).

I hope the framework I've just introduced helps clarify what I find unpromising about these theories. By construction, any agent you can actually design and run is a single element of (a "practical strategy"), so every fact about rationality that can be incorporated into agent design gets "hidden inside" the individual , and the only things you can learn from the "ideal theory" are things which can't fit into a practical strategy.

For example, suppose (reasonably) that model averaging and complexity penalties are broadly good ideas that lead to good results. But all of the model averaging and complexity penalization that can be done computably happens inside some Turing machine or other, at the level "below" Solomonoff. Thus Solomonoff only tells you about the extra advantage you can get by doing these things uncomputably. Any kind of nice Bayesian average over Turing machines that can happen computably is (of course) just another Turing machine.

This also explains why I find it misleading to say that good practical strategies constitute "approximations to" an ideal theory of this type. Of course, since just says to follow the best strategies in , if you are following a very good strategy in your behavior will tend to be close to that of . But this cannot be attributed to any of the searching over that does, since you are not doing a search over ; you are executing a single member of and ignoring the others. Any searching that can be done practically collapses down to a single practical strategy, and any that doesn't is not practical.

Concretely, this talk of approximations is like saying that a very successful chess player "approximates" the rule "consult all possible chess players, then weight their moves by past performance." Yes, the skilled player will play similarly to this rule, but they are not following it, not even approximately! They are only themselves, not any other player.

Any theory of ideal rationality that wants to be a guide for embedded agents will have to be constrained in the same ways the agents are. But theories of ideal rationality usually get all of their content by going to a level above the agents they judge. So this new theory would have to be a very different sort of thing.


To state all this more pithily: if we design the search space to contains everything feasible, then rationality-as-search has no feasible implications. If rationality-as-search is to have feasible implications, then the search space must be weak enough for there to be something feasible that is not a point in the search space.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 11:31 AM

This seems broadly right to me, but it seems to me like metaheuristics (in the numerical optimization sense) are practical and have a structure like the one that you're describing. Neural architecture search is the name people are using for this sort of thing in contemporary ML.

What's different between them and the sort of thing you describe? Well, for one the softening is even stronger; rather than a performance-weighted average across all strategies, it's a performance-weighted sampling strategy that has access to all strategies (but will only actually evaluate a small subset of them). But it seems like the core strategy--be both doing object-level cognition and meta-level cognition about how you're doing object-level cognitive--is basically the same.

It remains unclear to me whether the right way to find these meta-strategies is something like "start at the impractical ideal and rescue what you can" or "start with something that works and build new features"; it seems like modern computational Bayesian methods look more like the former than the latter. When I think about how to describe human epistemology, it seems like computationally bounded Bayes is a promising approach (where probabilities change both by the standard updates among hypotheses that already exist, and new operations to be formalized to add or remove hypotheses; you want to be able to capture "Why didn't you assign high probability to X?" "Because I didn't think of it; now that I have, I do."). But of course I'm using my judgment that already works to consider adding new features here, rather than having built how to think out of rescuing what I can from the impractical ideal of how to think.

But it seems like the core strategy--be both doing object-level cognition and meta-level cognition about how you're doing object-level cognitive--is basically the same.
It remains unclear to me whether the right way to find these meta-strategies is something like "start at the impractical ideal and rescue what you can" or "start with something that works and build new features"; it seems like modern computational Bayesian methods look more like the former than the latter.

I'd argue that there's usually a causal arrow from practical lore to impractical ideals first, even if the ideals also influence practice at a later stage. Occam's Razor came before Solomonoff; "change your mind when you see surprising new evidence" came before formal Bayes. The "core strategy" you refer to sounds like "do both exploration and exploitation," which is the sort of idea I'd imagine goes back millennia (albeit not in those exact terms).

One of my goals in writing this post was to formalize the feeling I get, when I think about an idealized theory of this kind, that it's a "redundant step" added on top of something that already does all the work by itself -- like taking a decision theory and appending the rule "take the actions this theory says to take." But rather than being transparently vacuous, like that example, they are vacuous in a more hidden way, and the redundant steps they add tend to resemble legitimately good ideas familiar from practical experience.

Consider the following (ridiculous) theory of rationality: "do the most rational thing, and also, remember to stay hydrated :)". In a certain inane sense, most rational behavior "conforms to" this theory, since the theory parasitizes on whatever existing notion of rationality you had, and staying hydrated is generally a good idea and thus does not tend to conflict with rationality. And whenever staying hydrated is a good idea, one could imagine pointing to this theory and saying "see, there's the hydration theory of rationality at work again." But, of course, none of this should actually count in the "hydration theory's" favor: all the real work is hidden in the first step ("do the most rational thing"), and insofar as hydration is rational, there's no need to specify it explicitly. This doesn't quite map onto the schema, but captures the way in which I think these theories tend to confuse people.

If the more serious ideals we're talking about are like the "hydration theory," we'd expect them to have the appearance of explaining existing practical methods, and of retrospectively explaining the success of new methods, while not being very useful for generating any new methods. And this seems generally true to me: there's a lot of ensemble-like or regularization-like stuff in ML that can be interpreted as Bayesian averaging/updating over some base space of models, but most of the excitement in ML is in these base spaces. We didn't get neural networks from Bayesian first principles.

I think that this is a slightly wrong account of the case for Solomonoff induction. The claim is not just that Solomonoff induction predicts computable environments better than computable predictors, but rather that the Solomonoff prior is an enumerable semimeasure that is also a mixture over every enumerable semimeasure, and therefore predicts computable environments at least as well as any other enumerable semimeasure. So, using your notation, . It still fails as a theory of embedded agency, since it only predicts computable environments, but it's not true that we must only compare it to prediction strategies strictly weaker than itself. The paper (Non-)Equivalence of Universal Priors has a decent discussion of this.

Although it's also worth noting that as per Theorem 16 of the above paper, not all universally dominant enumerable semimeasures are versions of the Solomonoff prior, so there's the possibility that the Solomonoff prior only does well by finding a good non-Solomonoff distribution and mimicking that.