Infra-Bayesianism is a recent theoretical framework in AI Alignment, coming from Vanessa Kosoy and Diffractor (Alexander Appel). It provides the groundwork for a learning theory of RL in the non-realizable case (when the hypothesis is not included in the hypothesis space), which ensures that updates don’t throw away useful information, and which also satisfies important decision-theoretic properties playing a role in Newcomb-like problems.
Unfortunately, this sequence of posts is really dense, and uses a lot of advanced maths in a very “textbook” approach; it’s thus hard to understand fully. This comes not from the lack of intuition (Diffractor and Vanessa are both very good at providing intuitions for every idea), but from the sheer complexity of the theory, as well as implicit (or quickly mentioned) links with previous research.
Thus my goal in this post is to give enough details for connecting the intuitions provided and the actual results, as well as the place of Infra-Bayesianism within the literature. I will not explain every proof and every result (if only because I’m not sure of my understanding of all of them). But I hope by the end of this post, you have a clearer map of Infra-Bayesianism, one good enough to dig into the posts themselves.
This post is splitted into three section
Reading advice: The reader is king/queen, but I still believe that there are two main ways to read this post to take something out of it:
Thanks to Vanessa and Diffractor for going above and beyond in answering all my questions and providing feedback on this post. Thanks to Jérémy Perret for feedback on this post.
Before going into the problem tackled by Infra-Bayesianism, I want to give some context in which to judge the value of this research.
AI Alignment is not yet a unified field; among other things, this means that a lot of researchers disagree on what one should work on, what constitutes a good solution, what is useful and what isn’t. So the first thing I look for when encountering a new piece of AI Alignment research is its set of underlying assumptions. In rationalist parlance, we would say the cruxes.
Infra-Bayesianism, just like most of Vanessa’s research, relies on three main cruxes made explicit in her research agenda:
In summary, Infra-Bayesianism makes sense as a component of a theory of RL, with the goal of proving formal guarantees on alignment and safety. Even if you disagree with some of these cruxes, I feel that being aware of them will help you understand these posts better.
The main idea motivating Infra-Bayesianism is the issue of non-realizability. Realizability is a common assumption on learning tasks, where the thing we are trying to learn (the function to approximate for example) is part of the hypothesis space considered. Recall that because of fundamental results like the no-free-lunch theorems, learning algorithms cannot consider all possible hypotheses equally -- they must have inductive biases which reduce the hypothesis space and order its elements. Thus even in standard ML, realizability is a pretty strong assumption.
And when you go from learning a known function (like XOR) to learning a complex feature of the real world, then another problem emerges, related to embedded agency: the learning agent is embedded into the world it wants to model, and so is smaller in an intuitive sense. Thus assuming that the hypothesis space considered by the learning algorithm (which is in some sense represented inside the algorithm) contains the function learned (the real world) becomes really improbable, at least from a computational complexity perspective.
One important detail that I missed at first when thinking about non-realizability is that the issue comes from assuming that the true hypothesis is one of the efficiently computable hypotheses which form your hypothesis space. So we’re still in the non-realizable setting when you might know the true hypothesis, but it’s either uncomputable or prohibitively expensive.
Going back to Infra-Bayesianism, non-realizability is a necessity for any practical mathematical theory of RL. But as explained in the first post of the sequence, there is not that many results on learning theory for RL:
For offline and online learning there are classical results in the non-realizable setting, in particular VC theory naturally extends to the non-realizable setting. However, for reinforcement learning there are few analogous results. Even for passive Bayesian inference, the best non-realizable result found in our literature search is Shalizi's which relies on ergodicity assumptions about the true environment. Since reinforcement learning is the relevant setting for AGI and alignment theory, this poses a problem.
If you’re like me, you get the previous paragraph, with the possible exception of the part about “ergodicity assumptions”. Such assumptions, roughly speaking, mean that the distribution of the stochastic process (here the real world) eventually stabilizes to a fixed distribution. Which will probably happen, around the heat-death of the universe. So it’s still a very oversimplified assumption, that Infra-Bayesiansm removes.
Now, the AI Alignment literature contains a well-known example of a non-realizable approach: Logical Induction. The quick summary is that Logical Induction deals with predicting logical consequences of known facts that are not yet accessible due to computational limits, in ways that ensure mistakes cannot be exploited for an infinite amount of “money” (in a market setting where predictions decide the “prices”). Logical inductors (algorithms solving Logical Induction) deals with a non-realizable setting because the guarantee they provide (non-exploitation) doesn’t depend on the “true” probability distribution. Equivalently, logical inductors attempt to approximate a probability distribution over logical sentences that is uncomputable, and that has no computable approximation in full.
Building on Logical Induction (and a parallel line of research, which includes the idea of Defensive Forecasting), a previous paper by Vanessa titled Forecasting Using Incomplete Models, extended these ideas to more general, abstract and continuous settings (instead of just logic). The paper still deals with non-realizability, despite having guarantees that depend on the true hypothesis. This is because the guarantees have premises about whether the true hypothesis is inside an efficiently computable set of hypotheses (a convex set), instead of requiring that the true hypothesis is itself efficiently computable. So instead of having a handful of hypotheses we can compute and saying “it’s one of them”, Forecasting Using Incomplete Models uses efficiently computable properties of hypotheses, and say that if the true hypothesis satisfies one of these properties, then an efficiently computable hypothesis with the same guarantees will be learned.
This idea of sets of probability distributions also appears in previous takes on imprecise probabilities, notably in Walley’s Statistical Reasoning with Imprecise Probabilities and Peng’s Nonlinear Expectations and Stochastic Calculus under Uncertainty. That being said, Vanessa and Diffractor heard about these only after finishing most of the research on Infra-Bayesianism. These previous works on imprecise probabilities also don’t deal with the decision theory aspects of Infra-Bayesianism.
Lastly, all the ideas presented for prediction above, from logical induction to imprecise probabilities, provide guarantees about the precision of prediction. But for a theory of RL, what we want are guarantees about expected utility. This leads directly to Infra-Bayesianism.
The main object of Infra-Bayesianism is the infradistribution (Definition 7 in Basic Inframeasure Theory): a set of “pimped up” probability distributions called sa-measures. These sa-measures capture information like the weight of the corresponding distribution in the infradistribution and the off-history utility, which prove crucial for decision theoretic reasoning further down the line (in Belief Functions and Decision Theory). Infradistributions themselves satisfy many conditions (recapped here in Basic Inframeasure Theory), which serves to ensure it’s the kind of computable property of environments/distributions that we want for our incomplete models.
Basic Inframeasure Theory, the first technical post in the sequence, defines everything mentioned previously from the ground up. It also brushes up on the measure theory and functional analysis used in the results, as well as show more advanced results like a notion of update (Definition 11) that takes into account what each sa-measure predicted, the corresponding Bayes Theorem for infradistributions (Theorem 6), a duality result which allow manipulation of infradistributions as concave, monotone, and uniformly continuous functionals (Theorem 4), and a lot of others useful theoretical constructions and properties (see for example the section Additional Constructions).
The next post, Belief Functions and Decision Theory, focuses on using Infra-Bayesianism in a decision theoretic and learning theoretic setting. At least the decision theoretic part is the subject of Section 3 in the present post, but before that, we need to go into more details about some basic parts of inframeasure theory.
(Between the first draft of this post and the final version, Vanessa and Diffractor published a new post called Less Basic Inframeasure Theory. Its focus on advanced results means I won’t discuss it further in this post)
Recall that we want to build a theory of RL. This takes the form of guarantees on the expected utility. There’s only one problem: we don’t have a distribution over environments on which to take the expectation!
As defined above, an infradistribution is a set of probability distributions (technically sa-measures, but that’s not important here). We thus find ourselves in the setting of Knightian uncertainty: we only know the possible “worlds”, not their respective probability. This fits with the fact that in the real world, we don’t have access to clean probabilities between the different environments we consider.
As theoretical computer scientists, Vanessa and Diffractor are fundamentally pessimistic: they want worst-case guarantees. Within a probabilistic setting, even our crowd of paranoid theoretical computer scientists will get behind a guarantee with a good enough probability. But recall that we have Knightian uncertainty! So we don’t have a quantitative measure of our uncertainty.
Therefore, the only way to have a meaningful guarantee is to assume an adversarial setting: Murphy, as he’s named in the sequence, chooses the worst environment possible for us. And we want a policy that maximizes the expected utility within the worst possible environment. That is, we take the maxmin expected utility over all environments considered.
To summarize, we want to derive guarantees about the maxmin expected utility of the policy learned.
So we want guarantees on maxmin expected utility for our given infradistributions. The last detail that’s missing concerns the elements of infradistributions: sa-measures. What are they? Why do we need them?
The answer to both questions comes from considering updates. Intuitively, we want to use an infradistribution just like a prior over environments. Following the analogy, we might wonder how to update after an action is taken and a new observation comes in. For a prior, you do a simple bayesian update of the distribution. But what do you do for an infradistribution?
Since it is basically a set of distributions, the obvious idea is to update every distribution (every environment in the set) independently. This has two big problems: loss of information and dynamic inconsistency
Relative probabilities of different environments
In a normal Bayesian update, if an environment predicted the current observation with a higher probability than another environment, then you would update your distribution in favor of the former environment. But our naive updates for infradistributions fails on this count: both environments would be updated by themselves, and then put in a set.Infra-Bayesianism’s solution for that is to consider environments as scaled distributions instead. The scaling factor plays the role of the probability in a distribution, but without some of the more stringent constraints.
Now, these scaled measures don’t have a name, because they’re not the final form of environments in Infra-Bayesianism.
Even with scaled measures, there is still an issue: dynamic inconsistency. Put simply, dynamic inconsistency is when the action made after some history is not the one that would have been decided by the optimal policy from the start.
For those of you that know a lot of decision theory, this is related to the idea of commitment, and how they can ensure good decision theoretic properties.
For others, like me, the main hurdle for understanding dynamic consistency is to see how deciding the best action at each step could be suboptimal, if you can be predicted well enough. And the example that drives that home for me is Parfit’s hitchhiker.
You’re stranded in the desert, and a car stops near you. The driver can get you to the next city, as long as you promise to give him a reward when you reach civilization. Also very important, the driver is pretty good at reading other human beings.
Now, if you’re the kind of person that makes the optimal decision at each step, you’re the kind of person that would promise to give a reward, and then not give it when you reach your destination. But the driver can see that, and thus leaves you in the desert. In that case, it would have been optimal to commit to give the reward and not defect at your destination.
Another scenario, slightly less obvious, is a setting where Murphy can choose between two different environments, such that the maxmin expected utility of choosing the optimal choice at each step is lower than for another policy. Vanessa and Diffractor give such an example in the section Motivating sa-measures of Introduction To The Infra-Bayesianism Sequence.
The trick is that you need to keep in mind what expected utility you would have if you were not in the history you’re seeing. That’s because at each step, you want to take the action that maximizes the minimal expected utility over the whole environment, not just the environment starting where you are.
Vanessa and Diffractor call this the “off-history” utility, which they combine with the scaled measure to get an a-measure (Definition 3 in Basic Inframeasure Theory). There’s a last step, that lets the measure be negative as long as the off-history utility term is bigger than the absolute value of any negative measure: this is an sa-measure (Definition 2 in Basic Inframeasure Theory). But that’s mostly relevant for the math, less for the intuitions.
So to get dynamic consistency, one needs to replace distributions in the sets with a-measures or sa-measures, and then maintain the right information appropriately. This is why the definition of infradistributions uses them.
Interestingly, doing so is coherent with Updateless Decision Theory, the main proposal for a decision theory that deals with Newcomb-like or Parfit’s hitchhiker types of problems. Note that we didn’t build any of the concepts in order to get back UDT. It’s simply a consequence of wanting to maxmin expected utility in this context.
UDT also helps with understanding the points of updates despite dynamic consistency: instead of asking for a commitment at the beginning of time for anything that might happen, dynamically consistent updates allows decisions to be computed online while still being coherent with the ideal precommitted decision. (It doesn’t solve the problem of computing the utility off-history, though)
Lastly, I want to focus on one of the many paths through Infra-Bayesianism. Why this one? Because I feel it is the most concrete I could find, and it points towards non obvious links (for me at least) about decision theory.
This path starts in the third post of the sequence, Belief Function and Decision Theory
Beliefs functions (Definition 11 in Belief Function and Decision Theory) are functions which take as input a partial policy (according to Definition 4), and return a set of a-measures (according to the definitions in Basic Inframeasure Theory mentioned above) on the outcome set of this partial policy (according to Definition 8).
We have already seen a-measures in the previous sections: they are built from a scaled distribution (here over outcomes) and a scalar term that tracks the off-history utility (to maintain dynamical consistency). For the rest of the new terms, here are the simple explanations.
To summarize, a belief function takes a policy, which gives new actions from histories ending in observations, and returns a property on distributions over the final histories of this policy. This generalizes a function that takes a policy and returns a distribution over histories.
Now, a confusing part of the Belief Function and Decision Theory post is that it doesn’t explicitly tell you that this set of a-measures over outcomes actually forms an infradistribution, which is the main mathematical object of Infra-Bayesianism. And to be exact, the outputs of a belief function are guaranteed to be infradistributions only if the belief function satisfies the condition listed here. Some of these conditions follow directly from the corresponding conditions for infradistributions; others depend on the Nirvana trick, that we will delve into later; still others are not that important for understanding the gist of Infra-Bayesianism.
So at this point in the post, we can go back to Basic Inframeasure Theory and look at the formal definition of infradistributions. Indeed, such a definition is fundamental for using belief functions as analogous to environments (functions sending policies to a distribution over histories).
The general case presented in Basic Inframeasure Theory considers measures, a-measures and sa-measures defined over potentially infinite sets (the outcome set might be infinite, for example if the policy is defined for every o-history). This requires assumptions on the structure of the set (compactness for example), and forces the use of complex properties of the space of measures (being a Banach space among other things), which ultimately warrants the use of functional analysis, the extension of linear algebra to infinite dimensional spaces.
Personally, I’m not well read enough in measure theory and functional analysis to follow everything without going back and forth between twenty Wikipedia pages, and even then I had trouble keeping with the high level abstractions.
Fortunately, there is a way to simplify tremendously the objects with which we work: assume the finiteness of the set on which measures are defined. This can be done naturally in the case of outcome sets, by considering Xn=Fn(πpa), the set of outcomes of length ≤n.
In that context, a measure over Xn is equivalent to a function from a finite domain to R+; which is equivalent to a point in (R+)|Xn|. So the space of measures over Xn is just the Euclidean space of dimension |Xn|. We’re back into linear algebra!
Now geometrical intuition can come to our help. Take Definition 2 of an sa-measure: it is just a point of (R+)|Xn|+1 such that the sum of the negative numbers among its first |Xn| components is less in absolute value than the last component. And an a-measure (from Definition 3) is an sa-measure where every component is non-negative. The sets Msa(Xn) and Ma(Xn) are then respectively the sets of all sa-measures and the sets all a-measures.
We can even visualize them pretty easily (with |Xn|=1):
There’s one more definition to go through before attacking infradistributions: the definition of the expectation of some continuous function from Xn to R by a set of sa-measures B. This is described by Vanessa and Diffractor as the behavior of f (continuous from Xn to [0,1]) according to B. Definition 4 gives EB(f) as the infinimum of m(f)+b for (m,b)∈B. And in our finite case, m(f)=∑x∈Xnm(x)f(x). So EB(f) can be rewritten as the infinimum of ∑x∈Xnm(x)f(x)+b for (m,b)∈B.
Intuitively, EB(f) represents the worst expected utility possible over B, where f is the utility function. This fits with our previous discussion of Knightian Uncertainty and Murphy, because we assume that the environment picked (the sa-measure) is the worst possible for us. That is, the one with the worst expected utility.
Geometrically in our finite setting, this is the smallest dot product of a point in ¯B with the point of (R+)|Xn|+1 which has for its first |Xn| components the values of f for the corresponding element of Xn, and for its last component 1.
We can finally go to infradistributions: an infradistribution B is just a set of sa-measures satisfying some conditions. I’ll now go through them, and try to provide as much intuition as possible.
Armed with these conditions, we now understand Definition 7 of □Xn, the set of infradistributions: it contains all the set of sa-measures that satisfy the conditions above.
There is another way to think about infradistributions: as functionals (in this case, applications from functions to R) with specific properties. This duality is crucial in many proofs and to build a better intuition of Infra-Bayesianism.
Given an infradistribution as a set B, how do we get its dual version? Easy: it’s the function h defined by h(f)=EB(f). So the expectation with regard to our set B is the other way to see and define the infradistribution. Theorem 4 states this correspondence, as well as the properties that h gets from being defined in this way through B:
Theorem 4, LF-duality, Sets to Functionals: If B is an infradistribution/bounded infradistribution, then h:f↦B(f) is concave, monotone, uniformly continuous/Lipschitz over C(X,[0,1]), h(0)=0,h(1)=1, and range(f)⊈[0,1]⟹h(f)=−∞.
Let’s look at the properties of h.
To summarize, Conditions 4 to 7 for infradistributions as sets do most of the work, while Conditions 1 to 3 are not considered since they just increase the size of the set without changing the expectation (technically Condition 1 is necessary everywhere, but it’s trivial).
Like a healthy relationship, a good duality goes both ways. Hence Theorem 5, which shows how to get an infradistribution as a set from a infradistribution as a functional (satisfying the conditions studied above). The proof of this one is way more involved, which is why I won’t go into it.
That being said, there is a nice way to visualize the set of sa-measures coming from an expectation in the finite dimensional case. Let’s say |Xn|=1. So there is only one outcome x. Let’s say we have an h satisfying all the properties above. Notably, it’s concave. Then the sa-measures of the corresponding infradistribution as a set are all the pairs (m(x),b) such that m(x)f(x)+b≥h(f) for all f. Visually, any line above h (which is basically a function of [0,1], since f is completely determined by its output for x).
In this plot, h is the blue function, and all other functions correspond to sa-measures in the dual “infradistribution as a set”. This provides a really cool geometrical intuition for some conditions on infradistributions. For example, upper completeness only means that we can add any line to one of our lines/sa-measures, and we’ll still be above h. Or minimal points being a-measures means that they are the tangents of h (like the pink and yellow one on the plot). And it generalizes in higher dimensions, by replacing lines with hyperplanes.
(To be clear, I didn’t come up with this geometric perspective; Diffractor explained it to me during a discussion about the duality).
So infradistributions are both sets of sa-measures and functionals, both satisfying specific conditions. The functional perspective is cleaner for proofs, but I’ll keep with the set perspective in the rest of this post.
Recall that “nice” belief functions return infradistributions on the outcome set of a policy. This is never stated explicitly in Belief Function and Decision Theory, but follows from the first conditions on belief functions from this section.
Other conditions matter for manipulating belief functions, like consistency and Hausdorff-continuity. But the point of this section isn’t to make you master Belief Function and Decision Theory; it’s to give you a path through it. And the last big idea on the path is Causality, and it’s relation to the Nirvana Trick.
Indeed, if you’ve read the sequence before, you might be surprised by me not mentioning the Nirvana trick already. My reason is that I only understood it correctly after getting causality, and causality requires all the background I layed out already.
The Nirvana Trick: Making Murphy Useful
Recall that we have Knightian Uncertainty over the environments we consider. So instead of maximizing the expected utility over a distribution of environments, we use worst-case reasoning, by assuming the environment is chosen by an adversary Murphy. This is a pretty neat setting, until we consider environments that depend on the policy. This happens notably in Newcomb-like problems (of which Parfit’s Hitchhiker is an example), which are an important fighting ground for decision-theories.
Now, it’s not so much that representing such environments is impossible; instead, it’s that what we think of as environments is usually simpler. Notably, what happens depends only on the action taken by the policy, not on the one it could have taken in other situations. This is also a setting where our intuitions about decisions are notably simpler, because we don’t have to think about predictions and causality in their full extent.
The Nirvana trick can be seen as a way to keep this intuition of environments, while still having a dependence of the environment on the policy. It starts with the policy-dependent environment, and then creates one policy-independent environment for each policy, by hard-coding this policy in the parameter slot of the policy-dependent environment. But that doesn’t guarantee that the hardcoded policy will match the actual policy. This is where Nirvana appears: if the policy acts differently than the hardcoded policy, it “goes to Nirvana”, meaning it gets maximum return (either through an infinite reward at that step or with a reward of 1 for each future step). Murphy, which wants to minimize your utility, will thus never choose an environment where Nirvana can be reached, that is never choose the ones with a different policy in the parameter slot.
To understand better the use of the Nirvana trick, we need to define different kinds of belief functions (called hypotheses) such that adding or removing Nirvana goes from one to the other.
Causality, Pseudocausality and Acausality
The three types of belief functions (called hypotheses) considered in Belief Function and Decision Theory are causal, pseudocausal and acausal. Intuitively, a causal hypothesis corresponds to a set of environments which doesn’t depend on the policy; a pseudocausal hypothesis corresponds to a set of environments which depends on the policy in some imperfect way; and an acausal hypothesis corresponds to a set of environments completely and exactly determined by the policy.
Causality can be made formal through the introduction of outcome functions (Definition 15), functions from a policy to a single sa-measure on the outcome set of this policy. On the other hand, recall that belief functions return an infradistribution, which is a set of sa-measures on the same set. Compared with the usual Bayesian setting, a belief function returns something analogous to a probability distribution over probability distribution over histories, while an outcome function returns something analogous to a single probability distribution over histories. An outcome function thus plays the role of an environment, which takes in a policy and gives a distribution over the outcomes/histories generated.
There is one additional subtlety about outcome functions that plays a big role in the rest of the formalism. If you look at Definition 15 in Belief Function and Decision Theory, it requires something about the projection of partial policies. The projection mapping (Definition 9) sends a sa-measure over a policy π1 to a sa-measure over a policy π2, if π1 is defined on strictly more histories than π2 and they agree when they’re both defined. Basically, if π1 extends π2, we can project back a measure over the outcomes of π1 to a measure over the outcomes of π2, by summing the measure of all outcomes of π1 that share as prefix a given outcome of π2.
Outcome functions must agree with that, in the sense that the outcome function applied to π2 must return the projection of what the outcome function returns when applied to π1. In that sense it’s a real environment, because if you extend a policy, it only splits the probability given to each prefix, not moves probability between prefixes.
Causal, pseudocausal and acausal hypotheses are defined through constraints related to the outcome functions corresponding to a belief function. They all share the first 9 Conditions on belief functions given here.
Perhaps one of the most important results philosophically of Infra-Bayesianism is that one can go from pseudocausality to causality by the Nirvana trick, and from causality to pseudocausality by removing Nirvana (Theorem 3.1). So the first direction basically means that if thinking about policy-dependency fries your brain, you can just add Nirvana, and voilà, everything is policy-independent and causal again. And equivalently, if you have a causal setting with the Nirvana trick, you can remove the trick at the price of only ensuring pseudocausality.
This looks really useful, because in my own experience, non causal situations are really confusing. Having a formal means to convert to a more causal case (at the price of using the Nirvana trick) could thus help in clarifying some issues with decision theory and Newcomb-like problems.
(The same sort of result holds between acausal hypotheses and so-called surcausal hypotheses, but this one requires digging into so many subtle details that I will not present it here.)
Infra-Bayesianism provides a framework for studying learning theory for RL in the context of non-realizability. It is based around infradistribution, sets of distributions with additional data, which satisfy additional conditions for both philosophical and mathematical reasons. Among the applications of Infra-Bayesianism, it can be used to study different decision theory problems in a common framework, and ensure updates which fit with what UDT would do at the beginning of time.
I hope that this post gave you a better idea of Infra-Bayesianism, and whether or not you want to take the time to dig deeper. If you do, I also hope that what I wrote will make navigation a bit easier.