Introduction To The Infra-Bayesianism Sequence

7Rohin Shah

2Vanessa Kosoy

3Rohin Shah

3Vanessa Kosoy

2Rohin Shah

3Vanessa Kosoy

3Diffractor

2Rohin Shah

1DanielFilan

1Vanessa Kosoy

1DanielFilan

2Vanessa Kosoy

2Diffractor

1DanielFilan

6Diffractor

3Richard Ngo

1Diffractor

1Charlie Steiner

2Vanessa Kosoy

1Diffractor

1Charlie Steiner

1Diffractor

2Charlie Steiner

1Diffractor

4DanielFilan

2Vanessa Kosoy

2DanielFilan

3michaelcohen

3Vanessa Kosoy

1Vanessa Kosoy

1michaelcohen

1Vanessa Kosoy

2Charlie Steiner

1DanielFilan

2Stuart Armstrong

1Vanessa Kosoy

2Stuart Armstrong

1Vanessa Kosoy

1Diffractor

2Stuart Armstrong

2Charlie Steiner

5Vanessa Kosoy

1Donald Hobson

3Vanessa Kosoy

0awenonian

1Diffractor

0awenonian

1Diffractor

0CronoDAS

2Vanessa Kosoy

1Diffractor

1[comment deleted]

New Comment

Planned summary for the Alignment Newsletter:

I have finally understood this sequence enough to write a summary about it, thanks to [AXRP Episode 5](

https://www.alignmentforum.org/posts/FkMPXiomjGBjMfosg/axrp-episode-5-infra-bayesianism-with-vanessa-kosoy). Think of this as a combined summary + highlight of the sequence and the podcast episode.The central problem of <@embedded agency@>(@Embedded Agents@) is that there is no clean separation between an agent and its environment: rather, the agent is _embedded_ in its environment, and so when reasoning about the environment it is reasoning about an entity that is “bigger” than it (and in particular, an entity that _contains_ it). We don’t have a good formalism that can account for this sort of reasoning. The standard Bayesian account requires the agent to have a space of precise hypotheses for the environment, but then the true hypothesis would also include a precise model of the agent itself, and it is usually not possible to have an agent contain a perfect model of itself.

A natural idea is to reduce the precision of hypotheses. Rather than requiring a hypothesis to assign a probability to every possible sequence of bits, we now allow the hypotheses to say “I have no clue about this aspect of this part of the environment, but I can assign probabilities to the rest of the environment”. The agent can then limit itself to hypotheses that don’t make predictions about the part of the environment that corresponds to the agent, but do make predictions about other parts of the environment.

Another way to think about it is that it allows you to start from the default of “I know nothing about the environment”, and then add in details that you do know to get an object that encodes the easily computable properties of the environment you can exploit, while not making any commitments about the rest of the environment.

Of course, so far this is just the idea of using [Knightian uncertainty](

https://en.wikipedia.org/wiki/Knightian_uncertainty). The contribution of infra-Bayesianism is to show how to formally specify a decision procedure that uses Knightian uncertainty, while still satisfying many properties we would like a decision procedure to satisfy. You can thus think of it as an extension of the standard Bayesian account of decision making to the setting in which the agent cannot represent the true environment as a hypothesis over which it can reason.Imagine that, instead of having a probability distribution over hypotheses, we instead have two “levels”: first are all the properties we have Knightian uncertainty over, and then are all the properties we can reason about. For example, imagine that the environment is an infinite sequence of bits, and we want to say that all the even bits come from flips of a possibly biased coin, but we know nothing about the odd coin flips. Then, at the top level, we have a separate branch for each possible setting of the odd coin flips. At the second level, we have a separate branch for each possible bias of the coin. At the leaves, we have the hypothesis “the odd bits are as set by the top level, and the even bits are generated from coin flips with the bias set by the second level”.

(Yes, there are lots of infinite quantities in this example, so you couldn’t implement it the way I’m describing it here. An actual implementation would not represent the top level explicitly and would use computable functions to represent the bottom level. We’re not going to worry about this for now.)

If we were using orthodox Bayesianism, we would put a probability distribution over the top level, and a probability distribution over the bottom level. You could then multiply that out to get a single probability distribution over the hypotheses, which is why we don’t do this separation into two levels in orthodox Bayesianism. (Also, just to reiterate, the _whole point_ is that we can’t put a probability distribution at the top level, since that implies e.g. making precise predictions about an environment that is bigger than you are.)

Infra-Bayesianism says, “what if we just… not put a probability distribution over the top level?” Instead, we have a set of probability distributions over hypotheses, and Knightian uncertainty over which distribution in this set is the right one. A common suggestion for Knightian uncertainty is to do _worst-case_ reasoning, so that’s what we’ll do at the top level. Lots of problems immediately crop up, but it turns out we can fix them.

First, let’s say your top level consists of two distributions over hypotheses, A and B. You then observe some evidence E, which A thought was 50% likely and B thought was 1% likely. Intuitively, you want to say that this makes A “more likely” relative to B than we previously thought. But how can you do this if you have Knightian uncertainty and are just planning to do worst-case reasoning over A and B? The solution here is to work with _unnormalized_ probability distributions at the second level. Then, in the case above, we can just scale the “probabilities” in both A and B by the likelihood assigned to E. We _don’t_ normalize A and B after doing this scaling.

But now what exactly do the numbers mean, if we’re going to leave these distributions unnormalized? Regular probabilities only really make sense if they sum to 1. We can take a different view on what a “probability distribution” is -- instead of treating it as an object that tells you how _likely_ various hypotheses are, treat it as an object that tells you how much we _care_ about particular hypotheses. (See [related](

https://www.lesswrong.com/posts/J7Gkz8aDxxSEQKXTN/what-are-probabilities-anyway) <@posts@>(@An Orthodox Case Against Utility Functions@).) So scaling down the “probability” of a hypothesis just means that we care less about what that hypothesis “wants” us to do.This would be enough if we were going to take an average over A and B to make our final decision. However, our plan is to do worst-case reasoning at the top level. This interacts horribly with our current proposal: when we scale hypotheses in A by 0.5 on average, and hypotheses in B by 0.01 on average, the minimization at the top level is going to place _more_ weight on B, since B is now _more_ likely to be the worst case. Surely this is wrong?

What’s happening here is that B gets most of its expected utility in worlds where we observe different evidence, but the worst-case reasoning at the top level doesn’t take this into account. Before we update, since B assigned 1% to E, the expected utility of B is given by 0.99 * expected utility given not-E + 0.01 * expected utility given E. After the update, the second part remains but the first part disappears, which makes the worst-case reasoning wonky. So what we do is we keep track of the first part as well, and make sure that our worst-case reasoning takes it into account.

This gives us **infradistributions**: sets of (m, b) pairs, where m is an unnormalized probability distribution and b corresponds to “the value we would have gotten if we had seen different evidence”. When we observe some evidence E, the hypotheses within m are scaled by the likelihood they assign to E, and b is updated to include the value we would have gotten in the world where we saw anything other than E. Note that it is important to specify the utility function for this to make sense, as otherwise it is not clear how to update b. To compute utilities for decision-making, we do worst-case reasoning over the (m, b) pairs, where we use standard expected values within each m. We can prove that this update rule satisfies _dynamic consistency_: if initially you believe “if I see X, then I want to do Y”, then after seeing X, you believe “I want to do Y”.

So what can we do with infradistributions? Our original motivation was to talk about embedded agency, so a natural place to start is with decision theory problems in which the environment contains a perfect predictor of the agent, such as in Newcomb’s problem. Unfortunately, we can’t immediately write this down with infradistributions, because we have no way of (easily) formally representing “the environment perfectly predicts my actions”. One trick we can use is to consider hypotheses in which the environment just spits out some action, without the constraint that it must match the agent’s action. We then modify the utility function to give infinite utility when the prediction is incorrect. Since we do worst-case reasoning, the agent will effectively act as though this situation is impossible. With this trick, infra-Bayesianism performs similarly to UDT on a variety of challenging decision problems.

Planned opinion:

This seems pretty cool, though I don’t understand it that well yet. While I don’t yet feel like I have a better philosophical understanding of embedded agency (or its subproblems), I do think this is significant progress along that path.

In particular, one thing that feels a bit odd to me is the choice of worst-case reasoning for the top level -- I don’t really see anything that _forces_ that to be the case. As far as I can tell we could get all the same results by using best-case reasoning instead (assuming we modified the other aspects appropriately). The obvious justification for worst-case reasoning is that it is a form of risk aversion, but it doesn’t feel like that is really sufficient -- risk aversion in humans is pretty different from literal worst-case reasoning, and also none of the results in the post seem to depend on risk aversion.

I wonder whether the important thing is just that we don’t do expected value reasoning at the top level, and there are in fact a wide variety of other kinds of decision rules that we could use that could all work. If so, it seems interesting to characterize what makes some rules work while others don’t. I suspect that would be a more philosophically satisfying answer to “how should agents reason about environments that are bigger than them”.

The central problem of <@embedded agency@>(@Embedded Agents@) is that there is no clean separation between an agent and its environment...

That's certainly one way to motivate IB, however I'd like to note that even if there *was* a clean separation between an agent and its environment, it could still be the case that the environment cannot be precisely modeled by the agent due to its computational complexity (in particular this must be the case if the environment contains other agents of similar or greater complexity).

The contribution of infra-Bayesianism is to show how to formally specify a decision procedure that uses Knightian uncertainty, while still satisfying many properties we would like a decision procedure to satisfy.

Well, the use of Knightian uncertainty (imprecise probability) in decision theory certain appeared in the literature, so it would be more fair to say that the contribution of IB is combining that with *reinforcement learning theory* (i.e. treating *sequential* decision making and considering learnability and regret bounds in this setting) and applying that to various other questions (in particular, Newcombian paradoxes).

In particular, one thing that feels a bit odd to me is the choice of worst-case reasoning for the top level -- I don’t really see anything that

forcesthat to be the case. As far as I can tell we could get all the same results by using best-case reasoning instead (assuming we modified the other aspects appropriately).

The reason we use worst-case reasoning is because we want the agent to satisfy certain *guarantees*. Given a learnable class of infra-hypotheses, in the limit, we can guarantee that whenever the true environment satisfies one of those hypotheses, the agent attains at least the corresponding amount of expected utility. You don't get anything analogous with best-case reasoning.

Moreover, there is an (unpublished) theorem showing that virtually any guarantee you might want to impose can be written in IB form. That is, let be the space of environments, and let be an increasing sequence of functions. We can interpret every as a requirement about the policy: . These requirements become stronger with increasing . We might then want to be s.t. it satisfies the requirement with the highest possible. The theorem then says that (under some mild assumptions about the functions ) there exists an *infra-environment* s.t. optimizing for it is equivalent to maximizing . (We can replace by a continuous parameter, I made it discrete just for ease of exposition.)

The obvious justification for worst-case reasoning is that it is a form of risk aversion, but it doesn’t feel like that is really sufficient -- risk aversion in humans is pretty different from literal worst-case reasoning, and also none of the results in the post seem to depend on risk aversion.

Actually it might be not that different. The Legendre-Fenchel duality shows you can think of infradistributions as just concave expectation functionals, which seems as a fairly general way to add risk-aversion to decision theory. It is also used in mathematical economics, see Peng.

it seems interesting to characterize what makes some rules work while others don’t.

Another rule which is tempting to use (and is known in the literature) is minimax-regret. However, it's possible to show that if you allow your hypotheses to depend on the utility function then you can reduce it to ordinary maximin.

I'd like to note that even if there was a clean separation between an agent and its environment, it could still be the case that the environment cannot be precisely modeled by the agent due to its computational complexity

Yeah, agreed. I'm intentionally going for a simplified summary that sacrifices details like this for the sake of cleaner narrative.

it would be more fair to say that the contribution of IB is combining that with reinforcement learning theory

Ah, whoops. Live and learn.

The reason we use worst-case reasoning is because we want the agent to satisfy certain guarantees. Given a learnable class of infra-hypotheses, in the γ→1

limit, we can guarantee that whenever the true environment satisfies one of those hypotheses, the agent attains at least the corresponding amount of expected utility. You don't get anything analogous with best-case reasoning.

Okay, that part makes sense. Am I right though that in the case of e.g. Newcomb's problem, if you use the anti-Nirvana trick (getting -infinity reward if the prediction is wrong), then you would still recover the same behavior (EDIT: if you also use best-case reasoning instead of worst-case reasoning)? (I think I was a bit too focused on the specific UDT / Nirvana trick ideas.)

Actually it might be not that different. The Legendre-Fenchel duality shows you can think of infradistributions as just concave expectation functionals, which seems as a fairly general way to add risk-aversion to decision theory.

Yeah... I'm a bit confused about this. If you imagine choosing any concave expectation functional, then I agree that can model basically any type of risk aversion. But it feels like your infra-distribution should "reflect reality" or something along those lines, which is an extra constraint. If there's a "reflect reality" constraint and a "risk aversion" constraint and these are completely orthogonal, then it seems like you can't necessarily satisfy both constraints at the same time.

On the other hand, maybe if I thought about it for longer, I'd realize that the things we think of as "risk aversion" are actually identical to the "reflect reality" constraint when we are allowed to have Knightian uncertainty over some properties of the environment. In that case I would no longer have my objection.

To be a bit more concrete: imagine that you know that the even bits in an infinite bitsequence come from a fair coin, but the odd bits come from some other agent, where you can't model them exactly but you have some suspicion that they are a bit more likely to choose 1 over 0. Risk aversion might involve making a small bet that you'd see a 1 rather than a 0 in some specific odd bit (smaller than what EU maximization / Bayesian decision theory would recommend), but "reflecting reality" might recommend having Knightian uncertainty about the output of the agent which would mean never making a bet on the outputs of the odd bits.

I am curious what happens in this scenario if you set the concave expectation functional based on the "risk aversion" setting above, and then use duality to get the "convex set of distributions" formulation -- would the resulting object be meaningful to us?

Am I right though that in the case of e.g. Newcomb's problem, if you use the anti-Nirvana trick (getting -infinity reward if the prediction is wrong), then you would still recover the same behavior (EDIT: if you also use best-case reasoning instead of worst-case reasoning)?

Yes

imagine that you know that the even bits in an infinite bitsequence come from a fair coin, but the odd bits come from some other agent, where you can't model them exactly but you have some suspicion that they are a bit more likely to choose 1 over 0. Risk aversion might involve making a small bet that you'd see a 1 rather than a 0 in some specific odd bit (smaller than what EU maximization / Bayesian decision theory would recommend), but "reflecting reality" might recommend having Knightian uncertainty about the output of the agent which would mean never making a bet on the outputs of the odd bits.

I think that if you are offered a single bet, your utility is linear in money and your belief is a *crisp* infradistribution (i.e. a closed convex set of probability distributions) then it is always optimal to bet either as much as you can or nothing at all. But for more general infradistributions this need not be the case. For example, consider and take the set of a-measures generated by and . Suppose you start with dollars and can bet any amount on any outcome at even odds. Then the optimal bet is betting dollars on the outcome , with a value of dollars.

But for more general infradistributions this need not be the case. For example, consider and take the set of a-measures generated by and . Suppose you start with dollars and can bet any amount on any outcome at even odds. Then the optimal bet is betting dollars on the outcome , with a value of dollars.

I guess my question is more like: shouldn't there be some aspect of reality that determines what my set of a-measures is? It feels like here we're finding a set of a-measures that rationalizes my behavior, as opposed to choosing a set of a-measures based on the "facts" of the situation and then seeing what behavior that implies.

I feel like we agree on what the technical math says, and I'm confused about the philosophical implications. Maybe we should just leave the philosophy alone for a while.

IIUC your question can be reformulated as follows: a crisp infradistribution can be regarded as a claim about reality (the true distribution is inside the set), but it's not clear how to generalize this to non-crisp. Well, if you think in terms of desiderata, then crisp says: if distribution is inside set then we have some lower bound on expected utility (and if it's not then we don't promise anything). On the other hand non-crisp gives a lower bound that is *variable* with the true distribution. We can think of non-crisp infradistirbutions as being *fuzzy* properties of the distribution (hence the name "crisp"). In fact, if we restrict ourselves to either of homogenous, cohomogenous or c-additive infradistributions, then we actually have a formal way to assign membership functions to infradistirbutions, i.e. literally regard them as fuzzy sets of distributions (which ofc have to satisfy some property analogous to convexity).

If you use the Anti-Nirvana trick, your agent just goes "nothing matters at all, the foe will mispredict and I'll get -infinity reward" and rolls over and cries since all policies are optimal. Don't do that one, it's a bad idea.

For the concave expectation functionals: Well, there's another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment works and assuming you get worst-case outcomes from the parts you can't predict.

For your concrete example, that's why you have multiple hypotheses that are learnable. Sure, one of your hypotheses might have complete knightian uncertainty over the odd bits, but another hypothesis might not. Betting on the odd bits is advised by a more-informative hypothesis, for sufficiently good bets. And the policy selected by the agent would probably be something like "bet on the odd bits occasionally, and if I keep losing those bets, stop betting", as this wins in the hypothesis where some of the odd bits are predictable, and doesn't lose too much in the hypothesis where the odd bits are completely unpredictable and out to make you lose.

If you use the Anti-Nirvana trick, your agent just goes "nothing matters at all, the foe will mispredict and I'll get -infinity reward" and rolls over and cries since all policies are optimal. Don't do that one, it's a bad idea.

Sorry, I meant the combination of best-case reasoning (sup instead of inf) and the anti-Nirvana trick. In that case the agent goes "Murphy won't mispredict, since then I'd get -infinity reward which can't be the best that I do".

For your concrete example, that's why you have multiple hypotheses that are learnable.

Hmm, that makes sense, I think? Perhaps I just haven't really internalized the learning aspect of all of this.

One thing I realized after the podcast is that because the decision theory you get can only handle pseudo-causal environments, it's basically trying to think about the statistics of environments rather than their internals. So my guess is that further progress on transparent newcomb is going to have to look like adding in the right kind of logical uncertainty or something. But basically it unsurprisingly has more of a statistical nature than what you imagine you want reading the FDT paper.

it's basically trying to think about the statistics of environments rather than their internals

That's not really true because the structure of infra-environments reflects the structure of those Newcombian scenarios. This means that the *sample complexity* of learning them will likely scale with their intrinsic complexity (e.g. some analogue of RVO dimension). This is different from treating the environment as a black-box and converging to optimal behavior by pure trial and error, which would yield much worse sample complexity.

I agree that infra-bayesianism isn't just thinking about sampling properties, and maybe 'statistics' is a bad word for that. But the failure on transparent Newcomb without kind of hacky changes to me suggests a focus on "what actions look good thru-out the probability distribution" rather than on "what logically-causes this program to succeed".

There is some truth in that, in the sense that, your beliefs must take a form that is *learnable* rather than just a god-given system of logical relationships.

There's actually an upcoming post going into more detail on what the deal is with pseudocausal and acausal belief functions, among several other things, I can send you a draft if you want. "Belief Functions and Decision Theory" is a post that hasn't held up nearly as well to time as "Basic Inframeasure Theory".

This post is still endorsed, it still feels like a continually fruitful line of research. A notable aspect of it is that, as time goes on, I keep finding more connections and crisper ways of viewing things which means that for many of the further linked posts about inframeasure theory, I think I could explain them from scratch better than the existing work does. One striking example is that the "Nirvana trick" stated in this intro (to encode nonstandard decision-theory problems), has transitioned from "weird hack that happens to work" to "pops straight out when you make all the math as elegant as possible". Accordingly, I'm working on a "living textbook" (like a textbook, but continually being updated with whatever cool new things we find) where I try to explain everything from scratch in the crispest way possible, to quickly catch up on the frontier of what we're working on. That's my current project.

I still do think that this is a large and tractable vein of research to work on, and the conclusion hasn't changed much.

I'm feeling very excited about this agenda. Is there currently a publicly-viewable version of the living textbook? Or any more formal writeup which I can include in my curriculum? (If not I'll include this post, but I expect many people would appreciate a more polished writeup.)

If you're looking for curriculum materials, I believe that the most useful reference would probably be my "Infra-exercises", a sequence of posts containing all the math exercises you need to reinvent a good chunk of the theory yourself. Basically, it's the textbook's exercise section, and working through interesting math problems and proofs on one's own has a *much* better learning feedback loop and retention of material than slogging through the old posts. The exercises are short on motivation and philosophy compared to the posts, though, much like how a functional analysis textbook takes for granted that you want to learn functional analysis and doesn't bother motivating it.

The primary problem is that the exercises aren't particularly calibrated in terms of difficulty, and in order for me to get useful feedback, someone has to actually work through all of them, so feedback has been a bit sparse. So I'm stuck in a situation where I keep having to link *everyone* to the infra-exercises over and over and it'd be really good to just get them out and publicly available, but if they're as important as I think, then the best move is something like "release them one at a time and have a bunch of people work through them as a group" like the fixpoint exercises, instead of "just dump them all as public documents".

I'll ask around about speeding up the public - ation of the exercises and see what can be done there.

I'd strongly endorse linking this introduction even if the exercises are linked as well, because this introduction serves as the table of contents to all the other applicable posts.

I'm confused about the Nirvana trick then. (Maybe here's not the best place, but oh well...) Shouldn't it break the instant you do anything with your Knightian uncertainty other than taking the worst-case?

Well, taking worst-case uncertainty is what infradistributions do. Did you have anything in mind that can be done with Knightian uncertainty besides taking the worst-case (or best-case)?

And if you were dealing with best-case uncertainty instead, then the corresponding analogue would be assuming that you go to hell if you're mispredicted (and then, since best-case things happen to you, the predictor must accurately predict you).

What if you assumed the stuff you had the hypothesis about was independent of the stuff you have Knightian uncertainty about (until proven otherwise)?

E.g. if you're making hypotheses about a multi-armed bandit and the world also contains a meteor that might smash through your ceiling and kill you at any time, you might want to just say "okay, ignore the meteor, pretend my utility has a term for gambling wins that doesn't depend on the meteor at all."

The reason I want to consider stuff more like this is because I don't like having to evaluate my utility function over all possibilities to do either an argmax *or* an argmin - I want to be lazy.

The weird thing about this is now whether this counts as argmax or argmin (or something else) depends on what my utility function looks like when I *do* include the meteor. If getting hit by the meteor only makes things worse (though potentially the meteor can still depend on which arm of of the bandit I pull!) then ignoring it is like being optimistic. If it only makes things better (like maybe the world I'm ignoring isn't a meteor, it's a big space full of other games I could be playing) then ignoring it is like being pessimistic.

Something analogous to what you are suggesting occurs. Specifically, let's say you assign 95% probability to the bandit game behaving as normal, and 5% to "oh no, anything could happen, including the meteor". As it turns out, this behaves similarly to the ordinary bandit game being guaranteed, as the "maybe meteor" hypothesis assigns all your possible actions a score of "you're dead" so it drops out of consideration.

The important aspect which a hypothesis needs, in order for you to ignore it, is that *no matter what you do you get the same outcome*, whether it be good or bad. A "meteor of bliss hits the earth and everything is awesome forever" hypothesis would also drop out of consideration because it doesn't really matter what you do in that scenario.

To be a wee bit more mathy, probabilistic mix of inframeasures works like this. We've got a probability distribution , and a bunch of hypotheses , things that take functions as input, and return expectation values. So, your prior, your probabilistic mixture of hypotheses according to your probability distribution, would be the function

It gets very slightly more complicated when you're dealing with environments, instead of static probability distributions, but it's basically the same thing. And so, if you vary your actions/vary your choice of function f, and one of the hypotheses is assigning all these functions/choices of actions the same expectation value, then it can be ignored completely when you're trying to figure out the best function/choice of actions to plug in.

So, hypotheses that are like "you're doomed no matter what you do" drop out of consideration, an infra-Bayes agent will always focus on the remaining hypotheses that say that what it does matters.

The meteor doesn't have to really flatten things out, there might be some actions that we think remain valuable (e.g. hedonism, saying tearful goodbyes).

And so if we have Knightian uncertainty about the meteor, maximin (as in Vanessa's link) means we'll spend a lot of time on tearful goodbyes.

TLDR: Infra-Bayesianism is a new approach to epistemology / decision theory / reinforcement learning theory, which builds on "imprecise probability" to solve the problem of prior misspecification / grain-of-truth / nonrealizability which plagues Bayesianism and Bayesian reinforcement learning. Infra-Bayesianism also naturally leads to an implementation of UDT, and (more speculatively at this stage) has applications to multi-agent theory, embedded agency and reflection. This post is the first in a sequence which lays down the foundation of the approach.

Prelude:Diffractor and Vanessa proudly present: The thing we've been working on for the past five months. I initially decided that Vanessa's scattered posts about incomplete models were interesting, and could benefit from being written up in a short centralized post. But as we dug into the mathematical details, it turned out it didn't really work, and then Vanessa ran across the true mathematical thing (which had previous ideas as special cases) and scope creep happened.

This now looks like a new, large, and unusually tractable vein of research. Accordingly, this sequence supersedes all previous posts about incomplete models, and by now we've managed to get quite a few interesting results, and have ideas for several new research directions.

Diffractor typed everything up and fleshed out the proof sketches, Vanessa originated almost all of the ideas and theorems. It was a true joint effort, this sequence would not exist if either of us were absent. Alex Mennen provided feedback on drafts to make it much more comprehensible than it would otherwise be, and Turntrout and John Maxwell also helped a bit in editing.

Be aware this sequence of posts has the math textbook issue where it requires loading a tower of novel concepts that build on each other into your head, and cannot be read in a single sitting.

We will be doing a group readthrough on MIRIxDiscord where we can answer questions and hopefully get collaborators, PM me to get a link.Introduction:Learning theory traditionally deals with two kinds of setting: "realizable" and "agnostic" or "non-realizable". In realizable settings, we assume that the environment can be described perfectly by a hypothesis inside our hypothesis space. (AIXI is an example of this) We then expect the algorithm to converge to acting as if it already knew the correct hypothesis. In non-realizable settings, we make no such assumption. We then expect the algorithm to converge to the best approximation of the true environment within the available hypothesis space.

As long as the computational complexity of the environment is greater than the computational complexity of the learning algorithm, the algorithm cannot use an easy-to-compute hypothesis that would describe the environment perfectly, so we are in the nonrealizable setting. When we discuss AGI, this is necessarily the case, since the environment is the entire world: a world that, in particular, contains the agent itself and can support other agents that are even more complex, much like how halting oracles (which you need to run Solomonoff Induction) are nowhere in the hypotheses which Solomonoff considers. Therefore, the realizable setting is usually only a toy model. So, instead of seeking guarantees of good behavior assuming the environment is easy to compute, we'd like to get good behavior simply assuming that the environment has some easy-to-compute properties that can be exploited.

For offline and online learning there are classical results in the non-realizable setting, in particular VC theory naturally extends to the non-realizable setting. However, for reinforcement learning there are few analogous results. Even for passive Bayesian inference, the best non-realizable result found in our literature search is Shalizi's which relies on ergodicity assumptions about the true environment. Since reinforcement learning is the relevant setting for AGI and alignment theory, this poses a problem.

Logical inductors operate in the nonrealizable setting, and the general reformulation of them in Forecasting Using Incomplete Models is of interest for broader lessons applicable to acting in an unknown environment. In said paper, reality can be drawn from any point in the space of probability distributions over infinite sequences of observations, Δ(Oω). Almost all of the points in this space aren't computable, and because of that, we shouldn't expect convergence to the true environment, as occurs in the realizable setting where the true environment lies in your hypothesis space.

However, even if we can't hope to learn the

trueenvironment, we can at least hope to learn somepropertyof the true environment, like "every other bit is a 0", and have our predictions reflect that if it holds. A hypothesis in this setting is a closed convex subset of Δ(Oω) which can be thought of as "I don't know what the true environment is, but it lies within this set". The result obtained in the above-linked paper was, if we fix a countable family of properties that reality may satisfy, and define the inductor based on them, then for all of those which reality fulfills, the predictions of the inductor converge to that closed convex set and so fulfill the property in the limit.What About Environments?However, this just involves sequence prediction. Ideally, we'd want some space that corresponds to environments that you can interact with, instead of an environment that just outputs bits. And then, given a suitable set B in it... Well, we don't have a fixed environment to play against. The environment could be

anything, even a worst-case one within B. We have Knightian uncertainty over our set of environments, it isnota probability distribution over environments. So, we might as well go with the maximin policy.argmaxπinfe∈B(Eπ⋅e[U])

Where π⋅e is the distribution over histories produced by policy π interacting with environment e. U is just some utility function.

When we refer to "Murphy", this is referring to whatever force is picking the worst-case environment to be interacting with. Of course, if you aren't playing against an adversary, you'll do better than the worst-case utility that you're guaranteed. Any provable guarantees come in the form of establishing lower bounds on expected utility if a policy is selected.

The problem of generating a suitable space of environments was solved in Reinforcement Learning With Imperceptible Rewards. If two environments are indistinguishable by any policy they are identified, a mixture of environments corresponds to picking one of the component environments with the appropriate probability at the start of time, and there was a notion of update.

However, this isn't good enough. We could find no good update rule for a set of environments, we had to go further.

Which desiderata should be fulfilled to make maximin policy selection over a set of environments (actually, we'll have to generalize further than this) to work successfully? We'll have three starting desiderata.

Desideratum 1:There should be a sensible notion of what it means to update a set of environments or a set of distributions, which should also give us dynamic consistency. Let's say we've got two policies, π and π′ which are identical except they differ after history h. If, after updating on history h, the continuation of π′ looks better than the continuation of π, then it had better be the case that, viewed from the start, π′ outperforms π.Desideratum 2:Our notion of a hypothesis (set of environments) in this setting should collapse "secretly equivalent" sets, such that any two distinct hypotheses behave differently insomerelevant aspect. This will require formalizing what it means for two sets to be "meaningfully different", finding a canonical form for an equivalence class of sets that "behave the same in all relevant ways", and then proving some theorem that says we got everything.Desideratum 3:We should be able to formalize the "Nirvana trick" (elaborated below) and cram any UDT problem where the environment cares about what youwoulddo, into this setting. The problem is that we're just dealing with sets of environments which only depend on what you do, not what your policy is, which hampers our ability to capture policy-dependent problems in this framework. However, since Murphy looks at your policy and then picks which environment you're in, thereisan acausal channel available for the choice of policy to influence which environment you end up in.The "Nirvana trick" is as follows. Consider a policy-dependent environment, a function Π×(A×O)<ω×A→ΔO (Ie, the probability distribution over the next observation depends on the history so far, the action you selected, and your policy). We can encode a policy-dependent environment as a set of policy-independent environments that don't care about your policy, by hard-coding every possible deterministic policy into the policy slot, making a family of functions of type (A×O)<ω×A→ΔO, which is the type of policy-independent environments. It's similar to taking a function f(x,y), and plugging in all possible x to get a family of functions that only depend on y.

Also, we will impose a rule that, if your action ever violates what the hard-coded policy predicts you do, you attain Nirvana (a state of high or infinite reward). Then, Murphy, when given this set of environments, will go "it'd be bad if they got high or infinite reward, thus I need to pick an environment where the hard-coded policy matches their

actualpolicy". When playing against Murphy, you'll act like you're selecting a policy for an environment thatdoespay attention to what policy you pick. As-stated, this doesn't quite work, but it can be repaired.There's two options. One is making Nirvana count as infinite reward. We will advance this to a point where we can capture any UDT/policy-selection problem, at the cost of some mathematical ugliness. The other option is making Nirvana count as 1 reward forever afterward, which makes things more elegant, and it is much more closely tied to learning theory, but that comes at the cost of only capturing a smaller (but still fairly broad) class of decision-theory problems. We will defer developing that avenue further until a later post.

A Digression on Deterministic PoliciesWe'll be using deterministic policies throughout. The reason for using deterministic policies instead of probabilistic policies (despite the latter being a larger class), is that the Nirvana trick (with infinite reward) doesn't work with probabilistic policies. Also, probabilistic policies don't interact well with embeddedness, because it implicitly assumes that you have a source of random bits that the rest of the environment can never interact with (except via your induced action) or observe.

Deterministic policies can emulate probabilistic policies by viewing probabilistic choice as deterministically choosing a finite bitstring to enter into a random number generator (RNG) in the environment, and then you get some bits back and act accordingly.

However, we aren't assuming that the RNG is a good one. It could be insecure or biased or nonexistent. Thus, we can model cases like Death In Damascus or Absent-Minded Driver where you left your trusty coin at home and don't trust yourself to randomize effectively. Or a nanobot that's too small to have a high bitrate RNG in it, so it uses a fast insecure PRNG (pseudorandom number generator). Or game theory against a mindreader that can't see your RNG, just the probability distribution over actions you're using the RNG to select from, like an ideal CDT opponent. It can also handle cases where plugging certain numbers into your RNG chip cause lots of heat to be released, or maybe the RNG is biased towards outputting 0's in strong magnetic fields. Assuming you have a source of true randomness that the environment can't read isn't general enough!

Motivating Sa-MeasuresSets of probability distributions or environments aren't enough, we need to add in some extra data. This can be best motivated by thinking about how updates should work in order to get dynamic consistency.

Throughout, we'll be using a two-step view of updating, where first, we chop down the measures accordingly (the "raw update"), and then we renormalize back up to 1.

So, let's say we have a set of two probability distributions μ1 and μ2. We have Knightian uncertainty within this set, we genuinely don't know which one will be selected, it may even be adversarial. μ1 says observation o has 0.5 probability, μ2 says observation o has 0.01 probability. And then you see observation o! The wrong way to update would be to go "well, both probability distributions are consistent with observed data, I guess I'll update them individually and resume being completely uncertain about which one I'm in", you don't want to ignore that one of them assigns 50x higher probability to the thing you just saw.

However, neglecting renormalization, we can do the "raw update" to each of them individually, and get m1 and m2 (finite measures, not probability distributions), where m1 has 0.5 measure and m2 has 0.01 measure.

Ok, so instead of a set of

probability distributions, since that's insufficient for updates, let's consider a set of measures m, instead. Each individual measure in that set can be viewed as λμ, where μ is a probability distribution, and λ≥0 is a scaling term. Note that λ is not uniform across your set, it varies depending on which point you're looking at.However, this

stillisn't enough. Let's look at a toy example for how to design updating to get dynamic consistency. We'll see we need to addone morepiece of data. Consider two environments where a fair coin is flipped, you see it and then say "heads" or "tails", and then you get some reward. The COPY Environment gives you 0 reward if you say something different than what the coin shows, and 1 reward if you match it. The REVERSE HEADS Environment always you 0.5 reward if the coin comes up tails, but it comes up heads, saying "tails" gets you 1 reward and "heads" gets you 0 reward. We have Knightian uncertainty between the two environments.For finding the optimal policy, we can observe that saying "tails" when the coin is tails helps out in COPY and doesn't harm you in REVERSE HEADS, so that's a component of an optimal policy.

Saying "tails" no matter what the coin shows means you get 0.5⋅0+0.5⋅1=0.5 utility on COPY, and 0.5⋅1+0.5⋅0.5=0.75 utility on REVERSE HEADS. Saying "tails" when the coin is tails and "heads" when the coin is heads means you get 0.5⋅1+0.5⋅1=1 utility on COPY and 0.5⋅0+0.5⋅0.5=0.25 utility on REVERSE HEADS. Saying "tails" no matter what has a better worst-case value, so it's the optimal maximin policy.

Now, if we see the coin come up heads, how should we update? The

wrongway to do it would be to go "well, both environments are equally likely to give this observation, so I've got Knightian uncertainty re: whether saying heads or tails gives me 1 or 0 utility, both options look equally good". This is because, according to past-you, regardless of what you did upon seeing the coin come up "tails", the maximin expected values of saying "heads" when the coin comes up heads, and saying "tails" when the coin comes up heads, are unequal. Past-you is yelling at you from the sidelines not to just shrug and view the two options as equally good.Well, let's say you

alreadyknow that you would say "tails" when the coin comes up tails and are trying to figure out what to do now that the coin came up heads. The proper way to reason through it is going "I have Knightian uncertainty between COPY which has 0.5 expected utility assured off-history since I say "tails" on tails, and REVERSE HEADS, which has 0.25 expected utility assured off-history. Saying "heads" now that I see the coin on heads would get me (0.5×1)+0.5=1 expected utility in COPY and (0.5×0)+0.25=0.25 utility in REVERSE HEADS, saying "tails" would get me (0.5×0)+0.5=0.5 utility in COPY and (0.5×1)+0.25=0.75 utility in REVERSE HEADS, I get higher worst-case value by saying "tails"." And then you agree with your past self re: how good the various decisions are.Huh, the proper way of doing this update to get dynamic consistency requires keeping track of the fragment of expected utility we get off-history.

Similarly, if you messed up and precommitted to saying "heads" when the coin comes up tails (a bad move), we can run through a similar analysis and show that keeping track of the expected utility off-history leads you to take the action that past-you would advise, after seeing the coin come up heads.

So, with the need to keep track of that fragment of expected utility off-history to get dynamic consistency, it isn't enough to deal with finite measures m, that still isn't keeping track of the information we need. What we need is (m,b), where m is a finite measure, and b is a number ≥0. That b term keeps track of the expected value off-history so we make the right decision after updating. (We're glossing over the distinction between probability distributions and environments here, but it's inessential)

We will call such a (m,b) pair an "affine measure", or "a-measure" for short. The reason for this terminology is because a measure can be thought of as a linear function from the space of continuous functions to R. But then there's this +b term stuck on that acts as utility, and a linear function plus a constant is an affine function. So, that's an a-measure. A pair of a finite measure and a b term where b≥0.

But wait, we can go even further! Let's say our utility function of interest is bounded. Then we can do a scale-and-shift until it's in [0,1].

Since our utility function is bounded in [0,1]... what would happen if you let in measures with negative parts, but only if they're paired with a sufficiently large b term? Such a thing is called an sa-measure, for signed affine measure. It's a pair of a finite signed measure and a b term that's as-large-or-larger than the amount of negative measure present. No matter your utility function, even if it assigns 0 reward to outcomes with positive measure and 1 reward to outcomes with negative measure, you're still assured nonnegative expected value because of that +b term. It turns out we actually

doneed to expand in this direction to keep track of equivalence between sets of a-measures, get a good tie-in with convex analysis because signed measures are dual to continuous functions, and have elegant formulations of concepts like minimal points and the upper completion.Negative measures may be a bit odd, but as we'll eventually see, we can ignore them and they only show up in intermediate steps, not final results, much like negative probabilities in quantum mechanics. And if negative measures ever become relevant for an application, it's effortless to include them.

Belief Function MotivationAlso, we'll have to drop the framework we set up at the beginning where we're considering sets of environments, because working with sets of environments has redundant information. As an example, consider two environments where you pick one of two actions, and get one of two outcomes. In environment e0, regardless of action, you get outcome 0. In environment e1, regardless of action, you get outcome 1. Then, we should be able to freely add an environment e2, where action 0 implies outcome 0, and where action 1 implies outcome 1. Why?

Well, if your policy is to take action 0, e2 and e0 behave identically. And if your policy is to take action 1, e2 and e1 behave identically. So, adding an environment like this doesn't affect anything, because it's a "chameleon environment" that will perfectly mimic

somepreexisting environment regardless of which policy you select. However, if you consider the function mapping an action to the set of possible probability distributions over outcomes, adding e2 didn't change that at all. Put another way, if it's impossible to distinguish in any way whether an environment was added to a set of environments because no matter what you do it mimics a preexisting environment, we might as well add it, and seek some alternate formulation instead of "set of environments" that doesn't have the unobservable degrees of freedom in it.To eliminate this redundancy, the

truething we should be looking at isn't a set of environments, but the "belief function" from policies to sets of probability distributions over histories. This is the function produced by having a policy interact with your set of environments and plotting the probability distributions you could get. Given certain conditions on a belief function, it is possible to recover a set of environments from it, but belief functions are more fundamental. We'll provide tools for taking a wide range of belief functions and turning them into sets of environments, if it is desired.Well, actually, from our previous discussion, sets of probability distributions are insufficient, we need a function from policies to sets of sa-measures. But that's material for later.

ConclusionSo, our fundamental mathematical object that we're studying to get a good link to decision theory is not sets of probability distributions, but sets of sa-measures. And instead of sets of environments, we have functions from policies to sets of sa-measures over histories. This is because probability distributions alone aren't flexible enough for the sort of updating we need to get dynamic consistency, and in addition to this issue, sets of environments have the problem where adding a new environment to your set can be undetectable in any way.

In the next post, we build up the basic mathematical details of the setting, until we get to a duality theorem that reveals a tight parallel between sets of sa-measures fulfilling certain special properties, and probability distributions, allowing us to take the first steps towards building up a version of probability theory fit for dealing with nonrealizability. There are analogues of expectation values, updates, renormalizing back to 1, priors, Bayes' Theorem, Markov kernels, and more. We use the "infra" prefix to refer to this setting. An infradistribution is the analogue of a probability distribution. An infrakernel is the analogue of a Markov kernel. And so on.

The post after that consists of extensive work on belief functions and the Nirvana trick to get the decision-theory tie-ins, such as UDT behavior while still having an update rule, and the update rule is dynamically consistent. Other components of that section include being able to specify your entire belief function with only part of its data, and developing the concept of Causal, Pseudocausal, and Acausal hypotheses. We show that you can encode almost any belief function as an Acausal hypothesis, and you can translate Pseudocausal and Acausal hypotheses to Causal ones by adding Nirvana appropriately (kinda). And Causal hypotheses correspond to actual sets of environments (kinda). Further, we can mix belief functions to make a prior, and there's an analogue of Bayes for updating a mix of belief functions. We cap it off by showing that the starting concepts of learning theory work appropriately, and show our setting's version of the Complete Class Theorem.

Later posts (not written yet) will be about the "1 reward forever" variant of Nirvana and InfraPOMDP's, developing inframeasure theory more, applications to various areas of alignment research, the internal logic which infradistributions are models of, unrealizable bandits, game theory, attempting to apply this to other areas of alignment research, and... look, we've got a lot of areas to work on, alright?

If you've got the relevant math skills, as previously mentioned, you should PM me to get a link to the MIRIxDiscord server and participate in the group readthrough, and you're more likely than usual to be able to contribute to advancing research further, there's a lot of shovel-ready work available.

Links to Further Posts: