Acknowledgements: This research began during the SERI MATS program, under the joint mentorship of John Wentworth, Nicholas Kees, and Janus. Thanks also to Davidad, Jack Sagar, and David Jaz Myers for discussion.
Abstract: I think that there is a uniform correspondence between flavours of uncertainty and monads taking state-spaces to belief-state-spaces, for different characterisation of belief. In this essay, I describe this correspondence explicitly and list 15 diverse and well-motivated examples. I explore some applications to model-building and agent foundations. Along the way, I characterise infrabayesianism uncertainty as the minimal way to encompass possibilistic uncertainty, probabilistic uncertainty, and reward.
No prerequisites are required beyond a high-school familiarity with sets, functions, real numbers, etc. Feedback welcome.
Introduction
Suppose I'm facing the following problem. There's an upcoming election between n candidates, and you're uncertain who will win. How can I model both your belief about the election and the election itself in a coherent way? By "belief" here, I mean your epistemic attitude, your internal model, your opinion, judgement, prediction, etc, etc. Think map-territory distinction: the election is the territory, your belief is the map, and I need to model both the map and the territory coherently despite the fact that the map and the territory are (typically speaking) two completely different types of thing.
Well, to model the election itself, I'll use a set S={s1,s2,s3,…sn} with an element for each electoral candidate. To represent your belief about the election, I must find another set B(S) with an element for each belief that you might have about the election. I'll call S the state space and B(S) the belief-state space. A solution to our problem is given by a mathematical operator B sending each state-space S to the matching belief-state space B(S).
One may feel prompted to ask: does any operator B suffice here? Can the belief-state space be anything whatsoever, or must it carry some extra structure, possibly satisfying some additional constraints? Or, stated more philosophically, can any territory serve as a map for any other? I say no. Roughly speaking, the operator B must be a so-called monad, which will be the central object of this essay. But more on that later.
The first thing to note is that the appropriate operator B will depend on how exactly I wish to characterise a "belief" about the election, and there are multiple options here. For example, I might choose to characterise your belief by the set of candidates that you think have a possibility of winning. In this case, B(S):=P+(S), denoting the set of non-empty subsets of S. Alternatively, I might choose to characterise your belief by the likelihood that you give each candidate. In this case, B(S):=Δ(S), denoting the set of finite-support probability distributions over S, i.e. functions p:S→[0,1] such that {s∈S:p(s)≠0} is finite and ∑s∈Sp(s)=1.
In the first option, I'm characterising your belief-state by your possibilistic uncertainty, often encountered in doxastic or epistemic logic. In the second option, I'm characterising your belief-state by your probabilistic uncertainty, which is a finer-grained characterisation of belief because it differentiates between e.g. thinking a coin is fair and thinking a coin is slightly biased.
The second option has its merits. Indeed, many readers will instinctively reach for Δ as soon as they hear the word "uncertainty", and this instinct would serve them well. There's been a fruitful enterprise (in philosophy, mathematics, computer science, linguistics, etc) of replacing possibilistic uncertainty with probabilistic uncertainty in any model or concept where one finds it. But I want to note that both P+ and Δ would count as a solution to the problem. I'll return to these two examples throughout this essay because they are the flavours of uncertainty which will be most familiar to the reader.
Flavour of uncertainty
Monad
Possibilistic
Nonempty-powerset monad P+
Probabilistic
Distribution monad Δ
As we will see, these two operators, P+ and Δ, are both monads. The central claim of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. By "flavour of uncertainty" I mean a particular way of characterising someone's potentially uncertain belief about something. Possibilistic and probabilistic are paradigm cases, but in this essay we'll meet fifteen examples.
The forward-implication of this claim, that every flavour of uncertainty is a monad, is perhaps uncontroversial in some circles.[1] The backwards-implication, that every monad is a flavour of uncertainty, is worthy of more scepticism.
In this essay —
I will describe the correspondence explicitly.
I'll present a step-by-step method for formalising different flavours of uncertainty using monads.
I'll list fifteen examples of the correspondence, which I hope the reader finds well-motivated.
Finally, I'll discuss the relevance to agent foundations, with reference to infrabayesianism in particular.
Don't worry if you don't yet know what monads are. By the end of this essay you'll understand them as well as I do, which is enough to nod along when you hear "monad this" and "monad that".
The correspondence explicitly.
What's a flavour of uncertainty?
Recall from the introduction that I'm tasked with representing or modelling both the election itself and your belief about the election. The first step of this task is to settle on a particular flavour of uncertainty to characterise the belief-states — possibilistic, probabilistic, infrabayesian, etc. One might ask, of this flavour of uncertainty, the following four questions —
Count? What's counts as a distinct belief about the election? Concretely, if there are n electoral candidates then how many distinct belief-states are there?
Certainty? If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine yourbelief-state?
Collapse? Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?
Combine? Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?
These four questions — Count? Certainty? Collapse? Combine? — are essentially epistemological questions, and they collectively pin down what I mean by a flavour of uncertainty.[2] As we will see, a monad corresponds to answers to the first three questions and a commutative monad corresponds to answers to all four questions.
Exercise 1: How would you answer these questions for possibilistic uncertainty? Or for probabilistic uncertainty?
Exercise 2: As I mentioned before, an answer to Count? is a set B(S) for each set S. What about for Certainty? Collapse? and Combine?
What's a (commutative) monad?
Monads were born of category theory — a field of mathematics which many regard as arcane, mystical, or downright kabbalistic — but monads can (I think) be understood by someone lacking any acquaintance with category theory whatsoever. Indeed, my claim in this essay is that monads correspond exactly to Map-Territory-like relations, and such relations will be familiar to anyone who's both got a brain and pondered this predicament.
I'll first write down the mathematical definition of a monad, and then I'll explain how this definition mirrors the four epistemological questions.
Definition: A monad(B,η,⊳) consists of three operators[3]:
The construct operatorB which assigns a set B(S) to each set S.
The return operatorη which assigns a function ηS:S→B(S) to each set S.
The bind operator⊳ which assigns a function ⊳WS:B(W)×(W→B(S))→B(S) to each pair of sets W,S.
Moreover, a commutative monad(B,η,⊳,⊗) is a monad (B,η,⊳) equipped with a fourth operator:
The product operator ⊗ which assigns a function ⊗AB:B(A)×B(B)→B(A×B) to each pair of sets A,B.
These operators must also satisfy some basic algebraic laws to qualify as a (commutative) monad. See here for details.
Notation: I'll use variables x,x′,x′′ for elements of X, and boldface variables x,x′,x′′ for elements of B(X). I may talk loosely of the monad B rather than (B,η,⊳) or of the commutative monad B rather than (B,η,⊳,⊗). I may write ηB, ⊳B, or ⊗B for clarification. I may write w⊳f instead of w⊳WSf, and a⊗b instead of a⊗ABb.
How do they correspond to each other?
In short, there is an exact correspondence between the operators of a (commutative) monad and the four epistemological questions. Let's go one-by-one.
1. Count? What's counts as a distinct belief about the election? Concretely, if there are n electoral candidates then how many distinct belief-states are there?
An answer to this question is the constructor operator, assigning a set B(S) to each set S. If S is the set of potential outcomes of an event then B(S) is the set of beliefs about the event.
As we discussed before, for possibilistic uncertainty B(S):=P+(S), and for probabilistic uncertainty B(S):=Δ(S).
2. Certainty? If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine yourbelief?
Here, an answer will be the return operator assigning a function ηS:S→B(S) to each set S. If you're certain that a state s∈S will occur, then ηS(s)∈B(S) is your belief-state.
For possibilistic uncertainty, ηS(s):={s}∈P+(S), the singleton set containing s. And for probabilistic uncertainty, ηS(s):=δs∈Δ(S), the dirac distribution at s given by δs:s′↦{1if s′=s0otherwise.
The function ηS:S→B(S) describes how the state-space embeds in the belief-state-space. This is related, I think, to the idea that each territory can serve as its own map. (See Borges' On Exactitude in Science for an exploration of this theme.) Or in the words of Norbert Wiener, “The best model of a cat is another, or preferably the same, cat.”
3. Collapse? Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?
Here, an answer will be the bind operator assigning a function ⊳WS:B(W)×(W→B(S))→B(S) to each pair of sets W and S. You should think of the bind operator as collapsing your second-order beliefs to your first-order beliefs — i.e. if each forecaster w∈W has an first-order belief f(w)∈B(S), and w∈B(S) is your second-order belief about which forecaster is correct, then (w⊳WSf)∈B(S) should be your first-order belief about the election.
For possibilistic uncertainty, w⊳f∈P+(S) is the union ⋃w∈wf(w). And for probabilistic uncertainty, w⊳f∈Δ(S) is the summation/integral s′↦∑w∈ww(s)⋅f(w)(s′).
This is related to the idea that a map of a map of a territory is a map of that same territory; a depiction of a depiction of person is a depiction of that same person, a representation of a representation of an idea is a representation of that same idea; etc.
One might think of f:W→B(S) as some parameterisation of the belief-state B(S) using some parameters W. Then the bind operator gives us the function for finding your S-belief from you W-belief. Explicitly, this function is(−⊳WSf):B(W)→B(S),w↦w⊳WSf.
Moreover, the bind operator doesn't just flatten one level of "meta". Often we have an entire hierarchy of state-spaces S0,S1,S2,…,Sn where beliefs about Si are parameterised by some "higher" state-space Si+1 via a function fi:Si+1→B(Si). Here, the state-space S0 is the object-level system, the state-space S1 parametrises your first-order beliefs about S0, the state-space S2 parameterises your second-order beliefs about S1, and so on. Then the bind operator says that I can collapse your nth-order beliefs all the way to your first-order beliefs via the function (−⊳fn−1⊳⋯⊳f0):B(Sn)→B(S0).[4]
4. Combine? Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?
An answer will be the product operator ⊗ assigning a function ⊗AB:B(A)×B(B)→B(A×B) to each pair of sets A and B. If a∈B(A) is your belief about the first election and b∈B(B) is your belief about an unrelated second election, then a⊗ABb∈B(A×B) is your belief about the pair of elections.
For possibilistic uncertainty, a⊗b∈P+(A×B) is the cartesian product {(a,b)∈A×B:a∈a,b∈b}. And for probabilistic uncertainty, a⊗b∈Δ(A×B) is the joint distribution (a,b)↦a(a)⋅b(b).
Thinking of S1×⋯×Sn as a factorisation of the state-space S, the product operator implies that your beliefs about each Sicombine to yield your overall belief about S. That is, a commutative monad B corresponds to a flavour of uncertainty that you can have to parts of the world, whereas a non-commutative monad B corresponds to a flavour of uncertainty that you can only have to the world in its entirety.
Historical note: The central thesis of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. I call this Myers' correspondence after David Jaz Myers, because I first encountered the idea in his book Categorical Systems Theory, where he devotes a chapter to using commutative monads to model various nondeterminism of automata. Nonetheless, he idea did not originate with him, he's never claimed it is true, and I don't know if he agrees with it.
Examples of Myers' correspondence
The correspondence between he operators of the (commutative) monad and the epistemological questions also serves as a practical recipe for formalising different flavours of uncertainty using monads. I've personally found it useful. First, think about the particular flavour of uncertainty, then answer the Four C's (Count? Certainty? Collapse? Combine?), convert those answers into mathematical operators, and voilà you've got yourself a monad.
I'll now zoom through fifteen examples, beginning (without commentary) with the paradigm examples of P+ and Δ.
1 - nonempty powerset monad
Flavour of uncertainty
Possibilistic
Monad
Nonempty powerset
Construct B(S)
P+(S)
Return ηS(s)
{s}
Bind w⊳WSf
⋃w∈wf(w)
Product a⊗ABb
a×b
Interpretation
x∈x if you consider the outcome x∈X to be possible.
2 - distribution monad
Flavour of uncertainty
Probabilistic
Monad
Distribution
Construct B(S)
Δ(S)
Return ηS(s)
δs:s′↦{1if s′=s0otherwise
Bind w⊳WSf
s:s′↦∑w∈ww(s)⋅f(w)(s′)
Product a⊗ABb
(a,b)↦a(a)⋅b(b)
Interpretation
x(x)∈[0,1] is your subjective credence in the outcome x∈X.
3 — reader monad from H
Okay, now let's deal with a flavour of uncertainty which is sometimes called "indeterminacy". An indeterminate belief is something like "Well, if h1 is true then x1, but if h2 is true then x2, but–", i.e. it's a belief which is uncertain because your best guess depends on some unknown variable. More formally, your belief-state is given by a particular function from H (the possible values of the unknown variable) to S (the state-space).
This is an ordinary usage of the word "uncertain" so, by Myers' correspondence, it must correspond to a monad, and we can discover which monad by answering the four Cs. If S is the state-space then the belief-state-space is given by SH, the set of functions s:H→S. So our construct operator is (−)H. If you're certain tha tthe outcome is s∈S then your belief-state is the constant function cs:h↦s. The intuitive answers to Collapse? and Combine? give us our bind and product operators.
x(h)=x if x∈X is your best guess about the outcome conditioned on the information h∈H.
4 — writer monad to [0,1]
Often, people will report their uncertain beliefs like "The coin will land heads (98%)" or "AI will disempower humanity (60%)". That is, their belief is a best guess paired with their confidence, which they offer as a lower-bound on the likelihood of that their guess is correct. A certain belief-state would be something like "The coin will land heads (100%)".
What monad corresponds to this flavour of uncertainty?
If S is the state-space then S×[0,1] is the belief-state-space, i.e. there's a distinct belief-state for each pair s=(s,q)∈S×[0,1]. If you're certain that the outcome is s∈S then your belief-state is (s,100%)∈S×[0,1]. Uncertainty is collapsed by multiplying the confidences. Uncertainty is combined also by multiplying the confidences.
x=(x,p) if p∈[0,1] is your confidence in the outcomex∈X, i.e. you think that the likelihood of x∈X is at least p.
Using the writer to [0,1] monad, we've characterised a belief-state as an outcome marked with some additional metadata, namely a confidence p∈[0,1]. What properties of the interval [0,1] did we appeal to in this definition? Well, firstly that we can multiply different elements (see bind and product operators). And secondly, that there's a fixed element such that multiplying with this element does nothing (see return operator).
Hence we can generalise: given any monoid (M,e,⊙) we have a monad B(S)=S×M called the writer-to-M monad.[5] By using different monoids, we can model different flavours of uncertainty, but note that this is only a commutative monad when (M,e,⊙) is a commutative monoid.
There's another ordinary usage of the word "uncertainty" where an uncertain belief would be something like "AGI arrives before 2040 unless there's a nuclear war" and a certain belief would be something like "AI will arrive before 2040." At least, with regards to teh binary question of whether AGI arrives before 2040. That is, an uncertain belief is one with an "unless..." clause.
Formalising this, we have a fixed set of events F, and a belief-state is a pair (s,E)∈S×F. Your belief-state is (s,E) when you commit to the state s∈S occurring unless the event E∈F occurs. This flavour of uncertainty corresponds to the writer monad B(S)=S×F, where F is a monoid when equipped with union ∪:F×F→F and the empty set ∅∈F.
One might use this flavour of uncertainty to models various kinds of defeasible reasoning, where a belief-state (s,E) is characterised by the precondition E under which the belief would be defeated or disavowed.
Flavour of uncertainty
Unless-claused guess
Monad
Writer monad to F
Construct B(S)
S×F
Return ηS(s)
(s,∅)
Bind w⊳WSf
(s,E1∪E2) where w=(w,E1) andf(w)=(s,E2)
Product a⊗ABb
((a,b),E1∪E2) where a=(a,E1) and b=(b,E2).
Interpretation
x=(x,E) if you think x will occur unless event E occurs.
Or maybe an uncertain belief is a one full of amendments, clarifications, conditions, disclaimers, excuses, hedges, limitations, qualification, refinements, reservations, restrictions, stipulations, temperings, etc. By contrast, a certain belief is made "with no ifs or buts", bare and direct.
Formalising this, we have a fixed set of clarifications C, and a belief-state is a pair (s,l)∈S×List(C). Here, List(C) is the free monoid over the set of clarifications C equipped with concatenation +:List(C)×List(C)→List(C) and the empty list []∈List(C).
Flavour of uncertainty
Clarified guess
Monad
Writer to List(C) monad
Construct B(S)
S×List(C)
Return ηS(s)
(s,[])
Bind w⊳WSf
(s,l1+l2) where w=(w,l1) andf(w)=(s,l1+l2)
Product a⊗ABb
N/A (See below.)
Interpretation
x=(x,l) if you think x will occur and l is a list of your clarifications.
Now, the writer to List(C) monad isn't a commutative monad. Or interpreted philosophically, a clarified guess isn't the kind of uncertainty you can have to parts of the world. Suppose "I think Alice is happy but I don't know her very well" is my belief-state about Alice, and "I think Bob is happy but he's difficult to read" is my belief-state about Bob. What's my belief-state about both Alice and Bob? Is it (1) "Alice and Bob are both happy, but I don't know Alice very well and Bob is difficult to read" or (2) "Alice and Bob are both happy, but Bob is difficult to read and I don't know Alice very well". That is, in which order should we combine the clarifications?
The instinctive trick is to declare that two belief-states are equal if the lists of clarifications are equalup-to-permutation — this implies that (1) and (2) are the same belief-state, which does seem intuitive to me. If we play this trick, then the resulting flavour of uncertainty is captured by the writer-to-N[C] monad, where N[C] is the free commutative monoid. This does indeed give a commutative monad!
Flavour of uncertainty
Unordered clarified guess
Monad
Writer monad to N[C]
Construct B(S)
S×N[C]
Return ηS(s)
(s,0)
Bind w⊳WSf
(s,l1+l2) where w=(w,l1) andf(w)=(s,l1+l2)
Product a⊗ABb
((a,b),l1+l2) where a=(a,l1) and b=(b,l2).
Interpretation
x=(x,l) if you think x will occur and l is an unordered list of your clarifications.
5 — identity monad
If we've anticipating an election between n candidates, then the simplest way to characterise your belief about the election by your best guess with no additional information about how unsure you are. If S is the state-space then S is also the belief-state-space, i.e. there's a distinct belief-state for each s∈S. The set of belief-states is therefore equal (up to bijection) to the set of outcomes itself.
I'll admit that this flavour of uncertainty is somewhat degenerate — e.g. every belief-state is a certainty in some particular state — but it's worth including nonetheless. On some readings of Wittgenstein's Tractatus, this is his model of how language represents the world, our utterances stand in direct isomorphism with the state-of-affairs.
Anyway, answering the four Cs would give the identity monad!
Flavour of uncertainty
Best guess
Monad
identity monad
Construct B(S)
S
Return ηS(s)
s
Bind w⊳WSf
f(w)
Product a⊗ABb
(a,b)
Interpretation
x=x if x∈X is your best guess about the outcome
6 — maybe monad
The last example was a bit silly, so how about this instead..?
If we've anticipating an election between n candidates, then I'll characterise your belief about the election either by your best guess (with no additional information) or an "I don't know" response. This is an very coarse-grained flavour of uncertainty — the only belief-state about the election (other than certainty in a particular candidate) is the belief-state of utter cluelessness, or shrugging one's shoulders!
Despite the coarse-grained-ness, it's pretty commonly encountered in the wild. For example, it's the typical flavour of uncertainty encountered in surveys/questionnaires, where ⊥ is read as "no opinion/don't know". It's also encountered in voting, where ⊥ is read as "abstention".
Formally speaking, if S is the state-space then there's a distinct belief-state for each state s∈S plus an additional option denoted ⊥. The belief-state-space is therefore S+1, denoting the disjoint union of S with the singleton set {⊥}. If you're certain that the outcome is s∈S then your belief-state is s∈S. This flavour of uncertainty corresponds to the famous maybe monad.
Flavour of uncertainty
guess-or-shrug
Monad
maybe monad
Construct B(S)
S+1
Return ηS(s)
s
Bind w⊳WSf
{⊥if w=⊥f(w)otherwise
Product a⊗ABb
{⊥if a=⊥ or b=⊥(a,b)otherwise
Interpretation
x=x if x∈X is your best guess for the outcome, and x=⊥ if you offer no best guess.
7 — K-distribution monad
You might, at this point, feel short-changed. I've discussed so far a range of flavours of uncertainty which are all coarser-grained than probabilistic knowledge, so why not stick to Δ? Let's consider then a more fined-grained characterisation of belief-state, one that tracks infinitesimal differences between probability assignments.
The Levi-Civita Field is an extension of the real numbers which contains infinitesimal values like ϵ,ϵ2,2ϵ+ϵ2,π2√ϵ and infinite values like ϵ−1,ϵ−2,ϵ1/3+ϵ−1/3+2. We can replace [0,1] in the definition of Δ with LCF to obtain a monad ΔLCF corresponding this flavour of uncertainty. On this account, a belief-state x∈ΔLCF(X) is something which tracks the potentiallyinfinitesimal likelihood x(x)∈LCF of each outcome x∈X. This flavour of uncertainty has applications in infinite ethics and cooperation in large worlds.
For example, in a universe with infinite radius ϵ−1, what's your prior likelihood that you occupy the most central galaxy? Presumably, the likelihood should be ϵ3/ρ∈LCF, where ρ∈R+ is the density of galaxies.
Now suppose you were offered a lottery which promises to benefit everyone by δ if you indeed occupy the most central galaxy but otherwise benefits no one. What's this lottery worth? Presumably, it's worth δ, because the infinitary stakes δ⋅ρ⋅ϵ−3 are cancelled out by the infinitesimal chance of winning ϵ3/ρ.
Note that because LCF is totally-ordered, once we assign LCF values to different lotties, we can perform expected utility maximisation as usual, and get sensible results. I think that infinitesimal probabilities resolves some (but not all) problems in infinite ethics. I'm particularly lured by the hope that, in an infinite cosmos, the infinitary stakes might somehow cancel out with infinitesimal probabilities to yield finite values. See Joe Carlsmith's essay On Infinite Ethics for further discussion.
Flavour of uncertainty
infinitesimal probabilistic
Monad
LCF-distribution monad
Construct B(S)
ΔLCF(S)
Return ηS(s)
δs:s′↦{1if s′=s0otherwise
Bind w⊳WSf
s:s′↦∑w∈ww(s)⋅f(w)(s′)
Product a⊗ABb
(a,b)↦a(a)⋅b(b)
Interpretation
x(x)∈LCF is your potentially infinitesimal subjective credence in the outcome x∈X
How far can one generalise the kind of entity that a "probability" must be, before our definition breaks? Well, so long as we have some rig K, we can define a monad ΔK by replacing [0,1] with K. A rig is a set K equipped with a zero element 0∈K, a unit element 1∈K, an addition function ⊕:K×K→K, and a multiplication function ⊗:K×K→K, satisfying certain algebraic laws. By choosing different rigs K then we obtain different monads ΔK corresponding to different flavours of uncertainty.
When K:=[0,1] we obtain the ordinary probability distributions, and when K:=Q∩[0,1] we obtain the rational probability distributions, etc. Toby Fritz suggests that by using similar tricks we might obtain quantum uncertainty, fuzzy uncertainty, and Dempster–Shafer uncertainty, but I haven't checked whether this is true.
Flavour of uncertainty
K-probabilistic
Monad
K-distribution monad
Construct B(S)
ΔK(S)
Return ηS(s)
δs:s′↦{1if s′=s0otherwise
Bind w⊳WSf
s:s′↦∑w∈ww(s)⋅f(w)(s′)
Product a⊗ABb
(a,b)↦a(a)⋅b(b)
Interpretation
x(x)∈K is your subjective credence in the outcome x∈X, where K is whatever rig of exotic probabilities
8 — quantum monad
For sure, quantum mechanics is endowed with its own flavour of uncertainty, hence the term Heisenberg's Uncertainty Principle. It's not impossible to catch a physicist saying "it's uncertain whether the qubit is 0 or 1" or "it's uncertain whether the cat is alive or dead", regardless of whether they consider quantum uncertainty as strictly speaking epistemic. By Myers' correspondence, this flavour of uncertainty must correspond to a monad.
The position of the North Star in the night sky is constant, static, immutable, certain; the position of Mercury, by contrast, is variable, dynamic, mutable, uncertain. Is this not a common sense of the word? Might one not say that my belief-state about Mercury's position will forever be uncertain, no matter how accurate my telescope or exhaustive my calculations, because my belief is always revised? If so, then by Myers' correspondence this flavour of uncertainty corresponds to a monad.
To formalise this, let's fix a differentiable manifold Θ parameterising your internal mental state as you think about a question. Note that because Θ is a differentiable manifold, it's equipped with tangent space Tθ at every θ∈Θ.
If S is the state-space, then ∏θ∈Θ(S×Tθ) is your belief-state-space. In other words, we have a distinct belief-state for each smooth transition function s∈∏θ∈Θ(S×Tθ). A belief-state s is characterised by a pair s(θ)=(s,v) for each θ∈Θ, where s∈S is your current guess and v∈Tθ is the tangent vector describing how your mental state is evolving. If you're certain that the winner is s∈S then your belief-state is the static transition function η(s):θ↦(s,0) where 0∈Tθ is the zero vector.
This is the smooth state monad — it's a differentiable version of the discrete-time state monad, with the additional benefit that it's commutative monad.
Flavour of uncertainty
evolving guess
Monad
smooth state monad
Construct B(S)
∏θ∈Θ(S×Tθ)
Return ηS(s)
θ↦(s,0)
Bind w⊳WSf
θ↦(s,v1+v2) where w(θ)=(w,v1) and f(w)(θ)=(s,v2)
Product a⊗ABb
θ↦((a,b),v1+v2) where a(θ)=(a,v1) and b(θ)=(b,v2).
Interpretation
The transition function x:∏θ∈Θ(X×Tθ) describes how your internal mental state evolves over time and produces guesses.
10 — continuation monad
What are belief-states actually for anyway? What purpose do they play in rational decision-making? According to one school of thought, belief-states are simply gadgets for taking expected values, and chiefly for taking expected utility values.
Let's say S is the set of candidates running in the election, and v:S→R is your utility function, i.e. v(s)∈R measures how happy you'd be to hear that the candidate s∈S has won. Then your ex-ante utility is some r∈R measuring how happy you are now in anticipation of the outcome. Given your belief-state, I should be able to determine r∈R from v:S→R, which implies that I can just characterise your belief-state about the election by how r∈R is determined from v:S→R. Neat.
This is formalised by the so-called continuation to R monad. If S is the state-space then K(S,R) is the belief-state-space, where K(S,R) is the set of functionals s:(S→R)→R. And a belief-state s:(S→R)→R is certain in the outcome s∈S if s determines your ex-ante utility simply by evaluating your utility function at s, i.e. s=λv:S→R.v(s).
The continuation monad encompasses both possibilistic uncertainty and probabilistic uncertainty. If the nonempty subset A∈P+(X) models your possibilistic uncertainty then the associated functional x∈K(X,R) is given by λv:X→R.min{v(x):x∈A}. If the distribution μ∈Δ(X) models your probabilistic uncertainty then the associated functional x∈K(X,R) is given by λv:X→R.Ex∼μ[v(x)].
Flavour of uncertainty
ex-ante utility
Monad
Continuation monad
Construct B(S)
K(S,R)
Return ηS(s)
v↦v(s)
Bind w⊳WSf
w(λw∈W.f(w)(v))
Product a⊗ABb
Unfortunately, K(−,R) is not a commutative monad.[7]
Interpretation
If v:X→R assigns your ex-post utility v(x)∈R to each outcome x∈X, then x(v)∈R is your ex-ante utility.
Exercise 4: (Beginner) Prove that the two maps P+(X)→K(X,R) and Δ(X)→K(X,R) are injections. (Advanced) Prove these injections are monad transformers.[8]
11 — signature monad
Maybe I should characterise your belief-state about something by the sentence that you'd utter about the outcome. This will result in a more syntactic or linguistic account of belief. You might imagine here a shared language, like English or Python, with which a speaker may report their beliefs to a friend. Or you might imagine a private mental language in which a brain/AI will store their knowledge about the world.
To make this rigorous, I must introduce a language containing all the sentences that you might utter about the outcome. Our language will include an atomic sentence ┌s┐ for every outcome s∈S, along with certain connectives for combining sentences. For example, suppose we have a language with two symbols, a binary connective ∨ called disjunction and a unary connective ¬ called negation. If S:={s1,…,sn} are the candidates in an election, then a belief-state about the electoral outcome is a sentence like ┌s5┐ or ┌(s2∨¬s3)∨¬(s4∨s6)┐.
The logical connectives can be specified by a signature(Σ,arity:Σ→N}. A signature is a set Σ equipped with a map arity:Σ→N sending each connective to its arity. So the aforementioned language has the signature Σ={∨,¬} with arity(∨)=2 and arity(¬)=1.
We denote the resulting set of sentences by L(Σ,S). This is a set containing all the sentences freely generated from S using the connectives in Σ. Explicitly, L(Σ,S) is the smallest set such that ┌s┐∈L(Σ,S) for every s∈S and ┌σ(ϕ1,…,ϕk)┐∈L(Σ,S) for every σ∈Σ, arity(σ)=k, and ┌ϕi┐∈L(Σ,S).
With this machinery in place, we can answer the Four C's, and thereby find the corresponding monad.
If S is the state-space then there's a distinct belief-state for each sentence s∈L(Σ,S).
If you're certain that the winner of the election is s∈S, then your belief-state is the sentence ┌s┐.
Let f:W→L(Σ,S) be the function assigning to each forecaster w∈W their belief-state f(w)∈L(Σ,S) about the election. And let w∈L(Σ,W) be your belief-state about the forecasters. Then your belief s∈L(Σ,S) about the election itself is given by uniform substitution: loop through the sentence w∈L(Σ,W) and, every time you come across an atomic letter w∈W, replace it with the sentence f(w)∈L(Σ,W). This results in a sentence s∈L(Σ,S).[9]
Unfortunately, L(Σ,−) isn't generally a commutative monad.[10]
Flavour of uncertainty
utterance in a language
Monad
signature monad L(Σ,−)
Construct B(S)
L(Σ,S)
Return ηS(s)
┌s┐
Bind w⊳WSf
Uniform substitution of everyw∈W with f(w)∈L(Σ,S) in the sentence w
Product a⊗ABb
N/A
Interpretation
x∈L(Σ,X) is the sentence that you would utter about the outcome, in a language which contains an atomic letter for each outcome x∈X and a logical connective for each σ∈Σ.
Many monads are equivalent to L(Σ,−) for some signature Σ, including many monads we've already encountered.
When Σ=∅, then L(Σ,−) is equivalent to the identity monad. This is intuitive. If there's no connectives in the language, then every utterance is a single atomic sentence positing one of the outcomes.
When Σ={⊥} consists of one constant symbol (i.e. zero-arity connective) then L(Σ,X) will contain the atomic sentences ┌x┐ plus one additional sentence ┌⊥┐. So L(Σ,−) is equivalent to the maybe monad. We encountered this before as modelling the guess-or-shrug flavour of uncertainty.
When Σ={⊥,⊥′,⊥′′,…} consists of many constant symbols, then L(Σ,X) will contain atomic sentences ┌x┐ plus additional sentences ┌⊥(k)┐ for every ⊥(k)∈Σ. So L(Σ,−) is equivalent to what's called the exception monad(−+Σ). This is like the guess-or-shrug, except there are multiple ways to shrug one's shoulders.
When Σ={¬} consists of one unary connective, then L(Σ,X) will contain sentences like ┌¬¬¬¬x┐. So L(Σ,−) is equivalent to the writer monad to the monoid (N,+,0). If Σ consists of many unary connectives, then L(Σ,−) is equivalent to the writer monad to List(Σ). We encountered this before as modelling the clarified guess.
When Σ={∨} consists of one a binary connective, then L(Σ,X) will consist of sentences like ┌(x1∨((x2∨x2)∨x3))∨(x5∨x8))┐. So L(Σ,−) is equivalent to the set of full binary trees over X. As Vanessa Kosoy notes, "we think of such a tree as a way to select an element of X by reading a stream of bits." (See here.)
Isn't the archetypal symbol of uncertainty... a fork in the road? Imagine a traveller facing two paths, left and right, each forking further ahead, and so on unboundedly, forming a fractal canopy of binary choices.
12 — algebraic theory
There's something a bit perverse about characterising your belief-state with a single utterance about the outcome. Namely, some utterances will be logically equivalent to each other, such as ┌ϕ┐ and ┌(ϕ∨ϕ)┐, and therefore the belief-state in which you're willing to utter ┌ϕ┐ is the exact same as the belief-state in which you're willing to utter ┌(ϕ∨ϕ)┐, assuming that you're both rational and honest. Therefore, our previous characterisation was overcounting the belief-states by distinguishing logically-equivalent sentences. Bizarrely, there would be infinitely-many belief-states about a single coin flip — i.e. ┌H┐, ┌(H∨H)┐, ┌(H∨(H∨H))┐, and so on.
To fix this, what we need isn't just a signature Σ, but rather a signature Σ paired with a set E of equational axioms, which is called an algebraic theory. An equational axiom is a pair of sentences built using the connectives in Σ and some placeholder sentence variables {a,b,c,…}. We use E to define an equivalence relation ∼E on L(Σ,X) by taking the deductive closure of the axioms, and then the equivalence classes of the sentences will be our belief-states.
For example, if our signature is {∨} and we intend to interpret the ∨ connective as disjunction, then E should consist of three axioms:
Idempotency, (a∨a)=a
Commutativity, (a∨b)=(b∨a)
Associativity,((a∨b)∨c)=(a∨(b∨c))
Furnished with the concept of an algebraic theory, we can now improve our answers:
If S is the state-space then there is a distinct belief-state for each equivalence class of sentences [ϕ]E:={┌ψ┐∈L(Σ,Y):┌ϕ┐∼E┌ψ┐}. This set is denoted L(Σ,E,Y):={[ϕ]E:ϕ∈L(Σ,Y)}.
If you're certain that the winner is s∈S, then your belief-state is the sentence [s]E∈L(Σ,E,S).
Let f:W→L(Σ,E,S) be the function assigning to each forecaster w∈W their belief-state f(w)∈L(Σ,E,S) about the election. And let w∈L(Σ,E,W) be your belief-state about the forecasters. Then your belief s∈L(Σ,E,S) about the election itself is given by uniform substitution modulo equivalence. We know w=[ϕ]E for some ┌ϕ┐∈L(Σ,W) and that f(w)=[F(w)]E for some F:W→L(Σ,S). Then w⊳f=[ϕ⊳L(Σ,−)WSF]E where ⊳L(Σ,−)WS is the bind operator for the signature monad. This is operation is well-defined because the deductive system satisfies referential transparency — i.e. if ┌ϕi┐∼E┌ϕ′i┐ then ┌σ(ϕ1,…,ϕn)┐∼E┌σ(ϕ′1,…,ϕ′n)┐.
Again, L(Σ,E,−) isn't generally a commutative monad.
Flavour of uncertainty
equivalence class of utterances
Monad
utterances-modulo-equivalence
Construct B(S)
L(Σ,E,S)
Return ηS(s)
[s]E
Bind w⊳WSf
[ϕ⊳L(Σ,−)WSF]E where w=[ϕ]E and ∀w∈W.f(w)=[F(w)]E
Product a⊗ABb
N/A
Interpretation
x∈L(Σ,E,X) is the set of sentence that you would assert about the outcome, in a language which contains an atomic letter for each outcome x∈X, a logical connective for each σ∈Σ, and where E is the set of equational axioms governing the connectives of Σ.
If a monad B is equivalent to L(Σ,E,−) for some algebraic theory (Σ,E) then we call (Σ,E) a presentation of the monad.[11] A presentation of a monad is a rather nice description of a flavour of uncertainty via some operators for defining belief-states in terms of other belief-states and some rules governing those operators.
When E is empty, then L(Σ,E,−) is obviously just the signature monad L(Σ,−).
When Σ={□p:p∈[0,1]} contains a unary connective for every p∈[0,1] and E contains the axioms □p□qa=□p⋅qa , then L(Σ,E,X) is equivalent to the writer monad to the monoid [0,1]. We encountered this before as the confidence-marked guess. In general, we can give a similar presentation for the writer monad to any monoid (M,e,⊙). So the unless-claused guess has a similar presentation.
When Σ={∨} andE is idempotency, commutativity, and associativity (shown above), then there is a distinct class [ϕ]E∈L(Σ,E,X) for each non-empty finite subset of X. So L(Σ,E,−) is equivalent to the nonempty finite powerset monad P+f. This is a finitary version of the monad P+ which we've encountered as modelling possibilistic uncertainty. This algebraic theory is also called the theory of semilattices.
Let's find a presentation for Δ the distribution monad. The signature {+p:p∈(0,1)} will contain a binary connective for every p∈(0,1). Our axioms will be a+pa=a (skew-idempotency), a+pb=b+1−pa (skew-commutativity), and ((a+pb)+qc)=(a+p⋅q(b+(1−p)⋅q1−p⋅qc)) for (skew-associativity). You should think of ┌ϕ+pψ┐ as p units of ┌ϕ┐ and (1−p) units of ┌ψ┐, which explains the ghastly expression for skew-associativity. This algebraic theory is called the theory of convex algebras.
Exercise 5: Find a presentation for ΔK for an arbitrary rig K.
13 — convex powerset of distributions monad
As we saw before, the continuation monad K(−,R) encompasses both possibilistic and probabilistic uncertainty. Unfortunately K(−,R) lacks any presentation, even if we allow connectives with infinite arity![12] Fortunately, there exists a monad encompassing both possibilistic and probabilistic uncertainty which is presentable.
Recall that the nonempty finite powerset monad P+f, which corresponds to possibilistic uncertainty, is presented by the theory of semilattices (Σ1,E1). And the distribution monad Δ, which corresponds to probabilistic uncertainty, is presented by the theory of convex algebras (Σ2,E2). Consider the theory (Σ1∪Σ2,E1∪E2∪D) where D={a+p(b∨c)=(a+pb)∨(a+pc)} is an additional axiom of describing how the +p connectives distribute over the ∨ connective.
This new theory is a presentation the convex powerset of distributions monad. This monad, denoted by C, corresponds to a flavour of uncertainty wherein a belief-state is a convex set of distributions, e.g. "The coin lands either heads (20-30%) or tails (70-80%)." (See credal sets.)
Now, we could have defined C in an entirely non-syntactic way, i.e. "C(X) is the set of nonempty finitely-generated convex-closed sets of finite-support distributions over X." But I think the syntactic definition, in terms of the algebraic theories for P+ and Δ, elucidates why C is a well-motivated unification of probabilistic and possibilistic uncertainty. We will employ a similar strategy for motivating infrabayesianism — roughly speaking, infrabayesianism is exactly what you get when you combine probabilistic and possibilistic uncertainty with reward.
Flavour of uncertainty
imprecise probability
Monad L(Σ,E,−)
convex powerset of distributions monad
Signature Σ
{∨}∪{+p:p∈(0,1)}
Axioms E
∨ is semilattice, i.e. a∨a=a a∨b=b∨a a∨(b∨c)=(a∨b)∨c
{+p:p∈(0,1)} is convex algebra, i.e. a+pa=a a+pb=b+1−pa ((a+pb)+qc)=(a+p⋅q(b+(1−p)⋅q1−p⋅qc))
+p distributes over ∨, i.e. a+p(b∨c)=(a+pb)∨(a+pc)
Interpretation
┌x┐ is certainty in an outcome x∈X.
┌ϕ1∨ϕ2┐ is possibilistic uncertainty between ┌ϕ1┐ and ┌ϕ2┐.
┌ϕ1+pϕ2┐ is probabilistic uncertainty between ┌ϕ1┐ (with chance p) and ┌ϕ2┐ (with chance 1−p).
14 — free convex lattice monad
There's a common usage of the word "uncertainty", where the uncertainty is modulo strategic choice. For example, you might hear "Black is certain to win" from a chess commentator if Black can force a checkmate, or hear "the winner is still uncertain" from a poker commentator during the flop. By Myers' correspondence, this flavour of uncertainty — call it "ludic uncertainty" — must correspond to some monad, but which?
Consider the theory of convex lattices — with signature ΣG={∨,∧,0,1}∪{+p:p∈(0,1)} and the following axioms:
+p distributes over ∨ and ∧, i.e. a+p(b∨c)=(a+pb)∨(a+pc) and a+p(b∧c)=(a+pb)∧(a+pc).
Then G:=L(ΣG,EG,−) is a monad corresponding, I think, to the aforementioned flavour of uncertainty. It sends a set X to the set G(X), the free convex lattices over X. An element of G(X) should be read as a game-tree whose non-leaf nodes are either a free binary choice by White, a free binary choice by Black, or a biased coin flip. The leaf nodes may be either wins for White, wins for Black, or an element of the set X.
We treat game-trees g,g′∈G(X) as equivalent if the same outcome would result from g and g′ regardless of the player's preferences over the elements of X. For example, the lattice axioms ϕ∨0=ϕ and ϕ∧1=ϕ will hold because no player would willingly choose to loose, and the axioms ϕ∨(ϕ∧ψ)=ϕ and ϕ∧(ϕ∨ψ)=ϕ establish that the players are adversarial, i.e. would never willingly empower one another.
Exercise 7: Consider the game┌((1∨x2)∧((x2+0.80)∨(x2∧1)))∧(x2∨(x5+0.5(x3∧x4)))┐ shown below. Which outcome is (ludically) certain?
Note that G(X) aren't really games in the usual sense, because leaf nodes might be elements of X, and we treat these elements are pairwise incomparable to both players. So you should think of G(X) as a set of partially-specified game trees. A fully-specified game tree would be an element of G([0,1]), which is a game tree where each leaf-node returns some [0,1]-valued utility to Black and disutility to White. You may notice that [0,1] can itself be equipped with the structure of a convex lattice, which just means there exists a G-algebra V:G([0,1])→[0,1].[14] This G-algebra is exactly the well-known used in combinatorial game theory.
Flavour of uncertainty
ludic
Monad L(Σ,E,−)
free convex lattices
Signature Σ
{∧,∨,0,1}∪{+p:p∈(0,1)}
Axioms E
{∧,∨,0,1} is a lattice.
{+p:p∈(0,1)} is convex algebra.
+p distributes over both ∧ and ∨, i.e. a+p(b∨c)=(a+pb)∨(a+pc)
Interpretation
┌x┐ is a game which will certainly result in outcome x∈X.
┌0┐ is a game where White wins and ┌1┐ is a game where Black wins.
┌ϕ∧ψ┐ is a game where White can choose to play ϕ or to play ψ.
┌ϕ∨ψ┐ is a game where Black can choose to play ϕ or to play ψ.
┌ϕ+pψ┐ is a game where ϕ is played with chance p and ψ with chance 1−p.
15 — infrabayesianism
When agents have beliefs about the same environment that they're embedded in, weird things can happen. Over the past few years, Vanessa Kosoy and Alex Appell have been exploring a novel flavour of uncertainty — infrabayesian uncertainty — which they claim more fruitfully characterises the belief-states of embedded agents. In particular, it characterises belief-states concerning Newcomb-like environments, where the state of the environment is correlated with the agent's choice under consideration. Their flavour of uncertainty corresponds to the infrabayesian monad □.
Roughly speaking, □ is the same as G above except without the ∧ connective. Consider (Σ,E) the theory of convex semilattices with top and bottom, which is a presentation of the composite monad P+f∘Δ∘(−+2).[15] From what I understand, this monad P+f∘Δ∘(−+2) is Kosoy's infrabayesian monad □.[16] This justifies the claim that infrabayesianism is the flavour of uncertainty that minimally encompasses both possibilistic uncertainty (via the P+f monad), probabilistic uncertainty (via the Δ monad), and reward (via the (−+{0,1}) monad). I think that this motivates infrabayesianism as a characterisation of an agent's belief-state about their environment.
Flavour of uncertainty
infrabayesian
Monad L(Σ,E,−)
infrabayesian monad
Signature Σ
{∨,0,1}∪{+p:p∈(0,1)}
Axioms E
∨ is a semilattice with a∨0=a and a∨1=1.
{+p:p∈(0,1)} is convex algebra.
+p distributes over ∨, i.e. a+p(b∨c)=(a+pb)∨(a+pc)
Interpretation
┌x┐ is an environment which certainly results in outcome x∈X.
0 is an impossible/contradictory environment where the agent achieves no disutility, called Nirvana.
1 is an environment where the agent suffers maximal disutility.
(ϕ∨ψ) is a environment which is either like ϕ or like ψ, and our agent should be pessimistic here.
(ϕ+pψ) is an environment which is like ϕ with chance p and ψ with chance 1−p.
Unfortunately, □ isn't a commutative monad, which means it's not a flavour of uncertainty that you can have to parts of the world, but only to the world in its entirety. Put starkly, there's no way to combine my infrabayesian belief-states about two coin toss to yield a single infrabayesian belief-state about the pair of coin tosses, even when the coin tosses are completely unrelated.[17] This, I think, limits both the theoretical appeal of infrabayesianism and its tractability.
Theoretically speaking, the fact that □ isn't a commutative monad weakens the analogy between infrabayesian uncertainty and possibilistic or probabilistic uncertainty. Many concepts are built upon possibilistic or probabilistic uncertainty which appeal, in an essential way, to the product operators ⊗P+ or ⊗Δ. And infrabayesianism, lacking such an operator, is not guaranteed the analogous concept.
Practically speaking, the lack of an infrabayesian product operator is an obstacle to parallelising algorithms which assume infrabayesian belief-states. There is no way to decompose the environment into separate components, discover an infrabayesian belief-state for each component, and then combine those belief-states into a single belief-state about the environment as a whole.
Implications for AI safety
Does this essay have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this essay is maybe agent foundations foundations or something like that. But I feel compelled to offer some practical implications for AI safety to validate my decision to write this essay and your decision to read it.
One lesson is that uncertainty comes in many flavours, and formalisating different flavours of uncertainty isn't mathematically challenging. Just ask yourself the Four C's (Count? Certainty? Collapse? Combine?) and you've got yourself a monad.
Often, you can replace one monad in a formalism with another and everything will still type-check. For example, the stochastic Markov decision processes are transition functions τ:A×S→Δ(O×S). One can generalise this to τ:A×S→B(O×S) for any monad B we've met so far.
If you're conducting active research into agent foundations, then instead of assuming a fixed flavour of uncertainty (e.g. possibilistic, probabilistic, infrabayesian, etc), perhaps see if you can generalise the theory to an arbitrary monad, or at least an arbitrary commutative monad. I call such theories "parametric in the monad". If you're gonna do foundational work, it often pays to make it highly parametric, even if you only care about a specific case.
The theory will be robust to errors about the appropriate flavour of uncertainty.
If you want to account for another flavour of uncertainty, you'll have saved yourself time, effort, and ink.
You've got more data points to sanity-check the theory — do you get sensible answers when you plug in different monads, e.g. P+f,id,Δ,□,(−×[0,1]) etc?
If your solution to AI safety involves, at some step, building a formal model of the environment (c.f. Davidad's Open Agency Architecture.) or of a human (c.f. imitative amplification), then this model should carry all the flavours of uncertainty that actually characterise your belief-state about the system. And you shouldn't feel compelled to shoe-horn all your uncertainties into a probability distribution. For example, unless-claused uncertainty seems pretty fundamental — we commit to our stochastic models of the environment and/or a human only within a narrow range of situations — and this flavour of uncertainty seems irreducible to probabilistic uncertainty.
Further questions
In so far as "flavours of uncertainty" is an informal term, there's little we can do to test the correspondence other than enumerating well-known flavours of uncertainty and checking that they do in fact correspond to monads, and vice-versa, enumerating the well-known monads and giving them natural doxastic interpretations. I think my own attempt has been positive, but this result is open to revision.
Secondly, the the biggest asterisks of my essay: my treatment of belief-states has been silent on their most important property, namely that they are learned. For example, a probability distribution can be conditioned on new evidence, and possibilistic uncertainty also carries an analogous notion of conditioning. Perhaps any characterisation of belief should answer additional questions about how those belief-state revised in light of new evidence/observations/considerations, etc. Perhaps we should append to Count? Certainty? Collapse? Combine? a fifth question, Condition? I'm sympathetic to this worry.
And if indeed learning is a phenomenon which must be modelled by any characterisation of belief, then monads do not themselves carry enough structure to characterise beliefs. Rather, we would need to equip the monad B with some additional structure, perhaps a family of maps learnS:O(S)×B(S)→B(S) for some spce of observations O(S), possibly satisfying some additional constraints such as learnS(o,−)∘ηS=ηS and learnS(o1⋅o2,−)=learnS(o2,−)∘learnS(o1,−). I'm just improvising here.
This is best left to future work, if the need arises.
Traditionally, the field of analytic epistemology has been concerned with defining epistemological concepts — i.e. constructing definitions for the concepts of knowledge, belief, evidence, learning, testimony, justification, etc. However, in recent years analytic epistemology has reorientated itself, chiefly under the influence of Timothy Williamson, towards modelling epistemological phenomena — i.e. constructing mathematical models for phenomena relating knowledge, belief, evidence, learning, testimony, justification, etc. This reorientation in epistemology, from concept-defining to model-building, was inspired by the natural sciences.
An operator F assigns, to every set X, another set/function F(X).
For example, P is the powerset operator, which assigns to every set X another set P(X). You can informally think of an operator as a function — but strictly speaking, an operator can't be a function because its domain would be the "set of all sets" (which doesn't exist).
Formally, the domain of an operator is something called a category. Categories can be larger than sets — in particular there is a category containing all the sets and the functions between them. For pedagogical purposes, I've framed everything in this article in terms of sets and functions, but most of the content of this article can applied to any category with enough structure.
Acknowledgements:
This research began during the SERI MATS program, under the joint mentorship of John Wentworth, Nicholas Kees, and Janus. Thanks also to Davidad, Jack Sagar, and David Jaz Myers for discussion.
Abstract:
I think that there is a uniform correspondence between flavours of uncertainty and monads taking state-spaces to belief-state-spaces, for different characterisation of belief. In this essay, I describe this correspondence explicitly and list 15 diverse and well-motivated examples. I explore some applications to model-building and agent foundations. Along the way, I characterise infrabayesianism uncertainty as the minimal way to encompass possibilistic uncertainty, probabilistic uncertainty, and reward.
No prerequisites are required beyond a high-school familiarity with sets, functions, real numbers, etc. Feedback welcome.
Introduction
Suppose I'm facing the following problem. There's an upcoming election between n candidates, and you're uncertain who will win. How can I model both your belief about the election and the election itself in a coherent way? By "belief" here, I mean your epistemic attitude, your internal model, your opinion, judgement, prediction, etc, etc. Think map-territory distinction: the election is the territory, your belief is the map, and I need to model both the map and the territory coherently despite the fact that the map and the territory are (typically speaking) two completely different types of thing.
Well, to model the election itself, I'll use a set S={s1,s2,s3,…sn} with an element for each electoral candidate. To represent your belief about the election, I must find another set B(S) with an element for each belief that you might have about the election. I'll call S the state space and B(S) the belief-state space. A solution to our problem is given by a mathematical operator B sending each state-space S to the matching belief-state space B(S).
One may feel prompted to ask: does any operator B suffice here? Can the belief-state space be anything whatsoever, or must it carry some extra structure, possibly satisfying some additional constraints? Or, stated more philosophically, can any territory serve as a map for any other? I say no. Roughly speaking, the operator B must be a so-called monad, which will be the central object of this essay. But more on that later.
The first thing to note is that the appropriate operator B will depend on how exactly I wish to characterise a "belief" about the election, and there are multiple options here. For example, I might choose to characterise your belief by the set of candidates that you think have a possibility of winning. In this case, B(S):=P+(S), denoting the set of non-empty subsets of S. Alternatively, I might choose to characterise your belief by the likelihood that you give each candidate. In this case, B(S):=Δ(S), denoting the set of finite-support probability distributions over S, i.e. functions p:S→[0,1] such that {s∈S:p(s)≠0} is finite and ∑s∈Sp(s)=1.
In the first option, I'm characterising your belief-state by your possibilistic uncertainty, often encountered in doxastic or epistemic logic. In the second option, I'm characterising your belief-state by your probabilistic uncertainty, which is a finer-grained characterisation of belief because it differentiates between e.g. thinking a coin is fair and thinking a coin is slightly biased.
The second option has its merits. Indeed, many readers will instinctively reach for Δ as soon as they hear the word "uncertainty", and this instinct would serve them well. There's been a fruitful enterprise (in philosophy, mathematics, computer science, linguistics, etc) of replacing possibilistic uncertainty with probabilistic uncertainty in any model or concept where one finds it. But I want to note that both P+ and Δ would count as a solution to the problem. I'll return to these two examples throughout this essay because they are the flavours of uncertainty which will be most familiar to the reader.
As we will see, these two operators, P+ and Δ, are both monads. The central claim of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. By "flavour of uncertainty" I mean a particular way of characterising someone's potentially uncertain belief about something. Possibilistic and probabilistic are paradigm cases, but in this essay we'll meet fifteen examples.
The forward-implication of this claim, that every flavour of uncertainty is a monad, is perhaps uncontroversial in some circles.[1] The backwards-implication, that every monad is a flavour of uncertainty, is worthy of more scepticism.
In this essay —
Don't worry if you don't yet know what monads are. By the end of this essay you'll understand them as well as I do, which is enough to nod along when you hear "monad this" and "monad that".
The correspondence explicitly.
What's a flavour of uncertainty?
Recall from the introduction that I'm tasked with representing or modelling both the election itself and your belief about the election. The first step of this task is to settle on a particular flavour of uncertainty to characterise the belief-states — possibilistic, probabilistic, infrabayesian, etc. One might ask, of this flavour of uncertainty, the following four questions —
What's counts as a distinct belief about the election? Concretely, if there are n electoral candidates then how many distinct belief-states are there?
If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine your belief-state?
Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?
Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?
These four questions — Count? Certainty? Collapse? Combine? — are essentially epistemological questions, and they collectively pin down what I mean by a flavour of uncertainty.[2] As we will see, a monad corresponds to answers to the first three questions and a commutative monad corresponds to answers to all four questions.
Exercise 1: How would you answer these questions for possibilistic uncertainty? Or for probabilistic uncertainty?
Exercise 2: As I mentioned before, an answer to Count? is a set B(S) for each set S. What about for Certainty? Collapse? and Combine?
What's a (commutative) monad?
Monads were born of category theory — a field of mathematics which many regard as arcane, mystical, or downright kabbalistic — but monads can (I think) be understood by someone lacking any acquaintance with category theory whatsoever. Indeed, my claim in this essay is that monads correspond exactly to Map-Territory-like relations, and such relations will be familiar to anyone who's both got a brain and pondered this predicament.
I'll first write down the mathematical definition of a monad, and then I'll explain how this definition mirrors the four epistemological questions.
How do they correspond to each other?
In short, there is an exact correspondence between the operators of a (commutative) monad and the four epistemological questions. Let's go one-by-one.
An answer to this question is the constructor operator, assigning a set B(S) to each set S. If S is the set of potential outcomes of an event then B(S) is the set of beliefs about the event.
As we discussed before, for possibilistic uncertainty B(S):=P+(S), and for probabilistic uncertainty B(S):=Δ(S).
Here, an answer will be the return operator assigning a function ηS:S→B(S) to each set S. If you're certain that a state s∈S will occur, then ηS(s)∈B(S) is your belief-state.
For possibilistic uncertainty, ηS(s):={s}∈P+(S), the singleton set containing s. And for probabilistic uncertainty, ηS(s):=δs∈Δ(S), the dirac distribution at s given by δs:s′↦{1if s′=s0otherwise.
The function ηS:S→B(S) describes how the state-space embeds in the belief-state-space. This is related, I think, to the idea that each territory can serve as its own map. (See Borges' On Exactitude in Science for an exploration of this theme.) Or in the words of Norbert Wiener, “The best model of a cat is another, or preferably the same, cat.”
Here, an answer will be the bind operator assigning a function ⊳WS:B(W)×(W→B(S))→B(S) to each pair of sets W and S. You should think of the bind operator as collapsing your second-order beliefs to your first-order beliefs — i.e. if each forecaster w∈W has an first-order belief f(w)∈B(S), and w∈B(S) is your second-order belief about which forecaster is correct, then (w⊳WSf)∈B(S) should be your first-order belief about the election.
For possibilistic uncertainty, w⊳f∈P+(S) is the union ⋃w∈wf(w). And for probabilistic uncertainty, w⊳f∈Δ(S) is the summation/integral s′↦∑w∈ww(s)⋅f(w)(s′).
This is related to the idea that a map of a map of a territory is a map of that same territory; a depiction of a depiction of person is a depiction of that same person, a representation of a representation of an idea is a representation of that same idea; etc.
One might think of f:W→B(S) as some parameterisation of the belief-state B(S) using some parameters W. Then the bind operator gives us the function for finding your S-belief from you W-belief. Explicitly, this function is(−⊳WSf):B(W)→B(S),w↦w⊳WSf.
Moreover, the bind operator doesn't just flatten one level of "meta". Often we have an entire hierarchy of state-spaces S0,S1,S2,…,Sn where beliefs about Si are parameterised by some "higher" state-space Si+1 via a function fi:Si+1→B(Si). Here, the state-space S0 is the object-level system, the state-space S1 parametrises your first-order beliefs about S0, the state-space S2 parameterises your second-order beliefs about S1, and so on. Then the bind operator says that I can collapse your nth-order beliefs all the way to your first-order beliefs via the function (−⊳fn−1⊳⋯⊳f0):B(Sn)→B(S0).[4]
An answer will be the product operator ⊗ assigning a function ⊗AB:B(A)×B(B)→B(A×B) to each pair of sets A and B. If a∈B(A) is your belief about the first election and b∈B(B) is your belief about an unrelated second election, then a⊗ABb∈B(A×B) is your belief about the pair of elections.
For possibilistic uncertainty, a⊗b∈P+(A×B) is the cartesian product {(a,b)∈A×B:a∈a,b∈b}. And for probabilistic uncertainty, a⊗b∈Δ(A×B) is the joint distribution (a,b)↦a(a)⋅b(b).
Thinking of S1×⋯×Sn as a factorisation of the state-space S, the product operator implies that your beliefs about each Si combine to yield your overall belief about S. That is, a commutative monad B corresponds to a flavour of uncertainty that you can have to parts of the world, whereas a non-commutative monad B corresponds to a flavour of uncertainty that you can only have to the world in its entirety.
Historical note: The central thesis of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. I call this Myers' correspondence after David Jaz Myers, because I first encountered the idea in his book Categorical Systems Theory, where he devotes a chapter to using commutative monads to model various nondeterminism of automata. Nonetheless, he idea did not originate with him, he's never claimed it is true, and I don't know if he agrees with it.
Examples of Myers' correspondence
The correspondence between he operators of the (commutative) monad and the epistemological questions also serves as a practical recipe for formalising different flavours of uncertainty using monads. I've personally found it useful. First, think about the particular flavour of uncertainty, then answer the Four C's (Count? Certainty? Collapse? Combine?), convert those answers into mathematical operators, and voilà you've got yourself a monad.
I'll now zoom through fifteen examples, beginning (without commentary) with the paradigm examples of P+ and Δ.
1 - nonempty powerset monad
2 - distribution monad
3 — reader monad from H
Okay, now let's deal with a flavour of uncertainty which is sometimes called "indeterminacy". An indeterminate belief is something like "Well, if h1 is true then x1, but if h2 is true then x2, but–", i.e. it's a belief which is uncertain because your best guess depends on some unknown variable. More formally, your belief-state is given by a particular function from H (the possible values of the unknown variable) to S (the state-space).
This is an ordinary usage of the word "uncertain" so, by Myers' correspondence, it must correspond to a monad, and we can discover which monad by answering the four Cs. If S is the state-space then the belief-state-space is given by SH, the set of functions s:H→S. So our construct operator is (−)H. If you're certain tha tthe outcome is s∈S then your belief-state is the constant function cs:h↦s. The intuitive answers to Collapse? and Combine? give us our bind and product operators.
Overall, we get what's called the reader monad from H.
4 — writer monad to [0,1]
Often, people will report their uncertain beliefs like "The coin will land heads (98%)" or "AI will disempower humanity (60%)". That is, their belief is a best guess paired with their confidence, which they offer as a lower-bound on the likelihood of that their guess is correct. A certain belief-state would be something like "The coin will land heads (100%)".
What monad corresponds to this flavour of uncertainty?
If S is the state-space then S×[0,1] is the belief-state-space, i.e. there's a distinct belief-state for each pair s=(s,q)∈S×[0,1]. If you're certain that the outcome is s∈S then your belief-state is (s,100%)∈S×[0,1]. Uncertainty is collapsed by multiplying the confidences. Uncertainty is combined also by multiplying the confidences.
Ta-da! The writer to [0,1] monad..
Using the writer to [0,1] monad, we've characterised a belief-state as an outcome marked with some additional metadata, namely a confidence p∈[0,1]. What properties of the interval [0,1] did we appeal to in this definition? Well, firstly that we can multiply different elements (see bind and product operators). And secondly, that there's a fixed element such that multiplying with this element does nothing (see return operator).
Hence we can generalise: given any monoid (M,e,⊙) we have a monad B(S)=S×M called the writer-to-M monad.[5] By using different monoids, we can model different flavours of uncertainty, but note that this is only a commutative monad when (M,e,⊙) is a commutative monoid.
There's another ordinary usage of the word "uncertainty" where an uncertain belief would be something like "AGI arrives before 2040 unless there's a nuclear war" and a certain belief would be something like "AI will arrive before 2040." At least, with regards to teh binary question of whether AGI arrives before 2040. That is, an uncertain belief is one with an "unless..." clause.
Formalising this, we have a fixed set of events F, and a belief-state is a pair (s,E)∈S×F. Your belief-state is (s,E) when you commit to the state s∈S occurring unless the event E∈F occurs. This flavour of uncertainty corresponds to the writer monad B(S)=S×F, where F is a monoid when equipped with union ∪:F×F→F and the empty set ∅∈F.
One might use this flavour of uncertainty to models various kinds of defeasible reasoning, where a belief-state (s,E) is characterised by the precondition E under which the belief would be defeated or disavowed.
Or maybe an uncertain belief is a one full of amendments, clarifications, conditions, disclaimers, excuses, hedges, limitations, qualification, refinements, reservations, restrictions, stipulations, temperings, etc. By contrast, a certain belief is made "with no ifs or buts", bare and direct.
Formalising this, we have a fixed set of clarifications C, and a belief-state is a pair (s,l)∈S×List(C). Here, List(C) is the free monoid over the set of clarifications C equipped with concatenation +:List(C)×List(C)→List(C) and the empty list []∈List(C).
Now, the writer to List(C) monad isn't a commutative monad. Or interpreted philosophically, a clarified guess isn't the kind of uncertainty you can have to parts of the world. Suppose "I think Alice is happy but I don't know her very well" is my belief-state about Alice, and "I think Bob is happy but he's difficult to read" is my belief-state about Bob. What's my belief-state about both Alice and Bob? Is it (1) "Alice and Bob are both happy, but I don't know Alice very well and Bob is difficult to read" or (2) "Alice and Bob are both happy, but Bob is difficult to read and I don't know Alice very well". That is, in which order should we combine the clarifications?
The instinctive trick is to declare that two belief-states are equal if the lists of clarifications are equal up-to-permutation — this implies that (1) and (2) are the same belief-state, which does seem intuitive to me. If we play this trick, then the resulting flavour of uncertainty is captured by the writer-to-N[C] monad, where N[C] is the free commutative monoid. This does indeed give a commutative monad!
5 — identity monad
If we've anticipating an election between n candidates, then the simplest way to characterise your belief about the election by your best guess with no additional information about how unsure you are. If S is the state-space then S is also the belief-state-space, i.e. there's a distinct belief-state for each s∈S. The set of belief-states is therefore equal (up to bijection) to the set of outcomes itself.
I'll admit that this flavour of uncertainty is somewhat degenerate — e.g. every belief-state is a certainty in some particular state — but it's worth including nonetheless. On some readings of Wittgenstein's Tractatus, this is his model of how language represents the world, our utterances stand in direct isomorphism with the state-of-affairs.
Anyway, answering the four Cs would give the identity monad!
6 — maybe monad
The last example was a bit silly, so how about this instead..?
If we've anticipating an election between n candidates, then I'll characterise your belief about the election either by your best guess (with no additional information) or an "I don't know" response. This is an very coarse-grained flavour of uncertainty — the only belief-state about the election (other than certainty in a particular candidate) is the belief-state of utter cluelessness, or shrugging one's shoulders!
Despite the coarse-grained-ness, it's pretty commonly encountered in the wild. For example, it's the typical flavour of uncertainty encountered in surveys/questionnaires, where ⊥ is read as "no opinion/don't know". It's also encountered in voting, where ⊥ is read as "abstention".
Formally speaking, if S is the state-space then there's a distinct belief-state for each state s∈S plus an additional option denoted ⊥. The belief-state-space is therefore S+1, denoting the disjoint union of S with the singleton set {⊥}. If you're certain that the outcome is s∈S then your belief-state is s∈S. This flavour of uncertainty corresponds to the famous maybe monad.
7 — K-distribution monad
You might, at this point, feel short-changed. I've discussed so far a range of flavours of uncertainty which are all coarser-grained than probabilistic knowledge, so why not stick to Δ? Let's consider then a more fined-grained characterisation of belief-state, one that tracks infinitesimal differences between probability assignments.
The Levi-Civita Field is an extension of the real numbers which contains infinitesimal values like ϵ,ϵ2,2ϵ+ϵ2,π2√ϵ and infinite values like ϵ−1,ϵ−2,ϵ1/3+ϵ−1/3+2. We can replace [0,1] in the definition of Δ with LCF to obtain a monad ΔLCF corresponding this flavour of uncertainty. On this account, a belief-state x∈ΔLCF(X) is something which tracks the potentially infinitesimal likelihood x(x)∈LCF of each outcome x∈X. This flavour of uncertainty has applications in infinite ethics and cooperation in large worlds.
For example, in a universe with infinite radius ϵ−1, what's your prior likelihood that you occupy the most central galaxy? Presumably, the likelihood should be ϵ3/ρ∈LCF, where ρ∈R+ is the density of galaxies.
Now suppose you were offered a lottery which promises to benefit everyone by δ if you indeed occupy the most central galaxy but otherwise benefits no one. What's this lottery worth? Presumably, it's worth δ, because the infinitary stakes δ⋅ρ⋅ϵ−3 are cancelled out by the infinitesimal chance of winning ϵ3/ρ.
Note that because LCF is totally-ordered, once we assign LCF values to different lotties, we can perform expected utility maximisation as usual, and get sensible results. I think that infinitesimal probabilities resolves some (but not all) problems in infinite ethics. I'm particularly lured by the hope that, in an infinite cosmos, the infinitary stakes might somehow cancel out with infinitesimal probabilities to yield finite values. See Joe Carlsmith's essay On Infinite Ethics for further discussion.
How far can one generalise the kind of entity that a "probability" must be, before our definition breaks? Well, so long as we have some rig K, we can define a monad ΔK by replacing [0,1] with K. A rig is a set K equipped with a zero element 0∈K, a unit element 1∈K, an addition function ⊕:K×K→K, and a multiplication function ⊗:K×K→K, satisfying certain algebraic laws. By choosing different rigs K then we obtain different monads ΔK corresponding to different flavours of uncertainty.
When K:=[0,1] we obtain the ordinary probability distributions, and when K:=Q∩[0,1] we obtain the rational probability distributions, etc. Toby Fritz suggests that by using similar tricks we might obtain quantum uncertainty, fuzzy uncertainty, and Dempster–Shafer uncertainty, but I haven't checked whether this is true.
8 — quantum monad
For sure, quantum mechanics is endowed with its own flavour of uncertainty, hence the term Heisenberg's Uncertainty Principle. It's not impossible to catch a physicist saying "it's uncertain whether the qubit is 0 or 1" or "it's uncertain whether the cat is alive or dead", regardless of whether they consider quantum uncertainty as strictly speaking epistemic. By Myers' correspondence, this flavour of uncertainty must correspond to a monad.
Exercise 3: Which?[6]
9 — smooth state monad
The position of the North Star in the night sky is constant, static, immutable, certain; the position of Mercury, by contrast, is variable, dynamic, mutable, uncertain. Is this not a common sense of the word? Might one not say that my belief-state about Mercury's position will forever be uncertain, no matter how accurate my telescope or exhaustive my calculations, because my belief is always revised? If so, then by Myers' correspondence this flavour of uncertainty corresponds to a monad.
To formalise this, let's fix a differentiable manifold Θ parameterising your internal mental state as you think about a question. Note that because Θ is a differentiable manifold, it's equipped with tangent space Tθ at every θ∈Θ.
If S is the state-space, then ∏θ∈Θ(S×Tθ) is your belief-state-space. In other words, we have a distinct belief-state for each smooth transition function s∈∏θ∈Θ(S×Tθ). A belief-state s is characterised by a pair s(θ)=(s,v) for each θ∈Θ, where s∈S is your current guess and v∈Tθ is the tangent vector describing how your mental state is evolving. If you're certain that the winner is s∈S then your belief-state is the static transition function η(s):θ↦(s,0) where 0∈Tθ is the zero vector.
This is the smooth state monad — it's a differentiable version of the discrete-time state monad, with the additional benefit that it's commutative monad.
10 — continuation monad
What are belief-states actually for anyway? What purpose do they play in rational decision-making? According to one school of thought, belief-states are simply gadgets for taking expected values, and chiefly for taking expected utility values.
Let's say S is the set of candidates running in the election, and v:S→R is your utility function, i.e. v(s)∈R measures how happy you'd be to hear that the candidate s∈S has won. Then your ex-ante utility is some r∈R measuring how happy you are now in anticipation of the outcome. Given your belief-state, I should be able to determine r∈R from v:S→R, which implies that I can just characterise your belief-state about the election by how r∈R is determined from v:S→R. Neat.
This is formalised by the so-called continuation to R monad. If S is the state-space then K(S,R) is the belief-state-space, where K(S,R) is the set of functionals s:(S→R)→R. And a belief-state s:(S→R)→R is certain in the outcome s∈S if s determines your ex-ante utility simply by evaluating your utility function at s, i.e. s=λv:S→R.v(s).
The continuation monad encompasses both possibilistic uncertainty and probabilistic uncertainty. If the nonempty subset A∈P+(X) models your possibilistic uncertainty then the associated functional x∈K(X,R) is given by λv:X→R.min{v(x):x∈A}. If the distribution μ∈Δ(X) models your probabilistic uncertainty then the associated functional x∈K(X,R) is given by λv:X→R.Ex∼μ[v(x)].
Exercise 4: (Beginner) Prove that the two maps P+(X)→K(X,R) and Δ(X)→K(X,R) are injections. (Advanced) Prove these injections are monad transformers.[8]
11 — signature monad
Maybe I should characterise your belief-state about something by the sentence that you'd utter about the outcome. This will result in a more syntactic or linguistic account of belief. You might imagine here a shared language, like English or Python, with which a speaker may report their beliefs to a friend. Or you might imagine a private mental language in which a brain/AI will store their knowledge about the world.
To make this rigorous, I must introduce a language containing all the sentences that you might utter about the outcome. Our language will include an atomic sentence ┌s┐ for every outcome s∈S, along with certain connectives for combining sentences. For example, suppose we have a language with two symbols, a binary connective ∨ called disjunction and a unary connective ¬ called negation. If S:={s1,…,sn} are the candidates in an election, then a belief-state about the electoral outcome is a sentence like ┌s5┐ or ┌(s2∨¬s3)∨¬(s4∨s6)┐.
The logical connectives can be specified by a signature(Σ,arity:Σ→N}. A signature is a set Σ equipped with a map arity:Σ→N sending each connective to its arity. So the aforementioned language has the signature Σ={∨,¬} with arity(∨)=2 and arity(¬)=1.
We denote the resulting set of sentences by L(Σ,S). This is a set containing all the sentences freely generated from S using the connectives in Σ. Explicitly, L(Σ,S) is the smallest set such that ┌s┐∈L(Σ,S) for every s∈S and ┌σ(ϕ1,…,ϕk)┐∈L(Σ,S) for every σ∈Σ, arity(σ)=k, and ┌ϕi┐∈L(Σ,S).
With this machinery in place, we can answer the Four C's, and thereby find the corresponding monad.
Many monads are equivalent to L(Σ,−) for some signature Σ, including many monads we've already encountered.
Isn't the archetypal symbol of uncertainty... a fork in the road? Imagine a traveller facing two paths, left and right, each forking further ahead, and so on unboundedly, forming a fractal canopy of binary choices.
12 — algebraic theory
There's something a bit perverse about characterising your belief-state with a single utterance about the outcome. Namely, some utterances will be logically equivalent to each other, such as ┌ϕ┐ and ┌(ϕ∨ϕ)┐, and therefore the belief-state in which you're willing to utter ┌ϕ┐ is the exact same as the belief-state in which you're willing to utter ┌(ϕ∨ϕ)┐, assuming that you're both rational and honest. Therefore, our previous characterisation was overcounting the belief-states by distinguishing logically-equivalent sentences. Bizarrely, there would be infinitely-many belief-states about a single coin flip — i.e. ┌H┐, ┌(H∨H)┐, ┌(H∨(H∨H))┐, and so on.
To fix this, what we need isn't just a signature Σ, but rather a signature Σ paired with a set E of equational axioms, which is called an algebraic theory. An equational axiom is a pair of sentences built using the connectives in Σ and some placeholder sentence variables {a,b,c,…}. We use E to define an equivalence relation ∼E on L(Σ,X) by taking the deductive closure of the axioms, and then the equivalence classes of the sentences will be our belief-states.
For example, if our signature is {∨} and we intend to interpret the ∨ connective as disjunction, then E should consist of three axioms:
Furnished with the concept of an algebraic theory, we can now improve our answers:
If a monad B is equivalent to L(Σ,E,−) for some algebraic theory (Σ,E) then we call (Σ,E) a presentation of the monad.[11] A presentation of a monad is a rather nice description of a flavour of uncertainty via some operators for defining belief-states in terms of other belief-states and some rules governing those operators.
Exercise 5: Find a presentation for ΔK for an arbitrary rig K.
13 — convex powerset of distributions monad
As we saw before, the continuation monad K(−,R) encompasses both possibilistic and probabilistic uncertainty. Unfortunately K(−,R) lacks any presentation, even if we allow connectives with infinite arity![12] Fortunately, there exists a monad encompassing both possibilistic and probabilistic uncertainty which is presentable.
Recall that the nonempty finite powerset monad P+f, which corresponds to possibilistic uncertainty, is presented by the theory of semilattices (Σ1,E1). And the distribution monad Δ, which corresponds to probabilistic uncertainty, is presented by the theory of convex algebras (Σ2,E2). Consider the theory (Σ1∪Σ2,E1∪E2∪D) where D={a+p(b∨c)=(a+pb)∨(a+pc)} is an additional axiom of describing how the +p connectives distribute over the ∨ connective.
This new theory is a presentation the convex powerset of distributions monad. This monad, denoted by C, corresponds to a flavour of uncertainty wherein a belief-state is a convex set of distributions, e.g. "The coin lands either heads (20-30%) or tails (70-80%)." (See credal sets.)
Now, we could have defined C in an entirely non-syntactic way, i.e. "C(X) is the set of nonempty finitely-generated convex-closed sets of finite-support distributions over X." But I think the syntactic definition, in terms of the algebraic theories for P+ and Δ, elucidates why C is a well-motivated unification of probabilistic and possibilistic uncertainty. We will employ a similar strategy for motivating infrabayesianism — roughly speaking, infrabayesianism is exactly what you get when you combine probabilistic and possibilistic uncertainty with reward.
∨ is semilattice,
i.e. a∨a=a
a∨b=b∨a
a∨(b∨c)=(a∨b)∨c
{+p:p∈(0,1)} is convex algebra,
i.e. a+pa=a
a+pb=b+1−pa
((a+pb)+qc)=(a+p⋅q(b+(1−p)⋅q1−p⋅qc))
+p distributes over ∨,
i.e. a+p(b∨c)=(a+pb)∨(a+pc)
┌x┐ is certainty in an outcome x∈X.
┌ϕ1∨ϕ2┐ is possibilistic uncertainty between ┌ϕ1┐ and ┌ϕ2┐.
┌ϕ1+pϕ2┐ is probabilistic uncertainty between ┌ϕ1┐ (with chance p) and ┌ϕ2┐ (with chance 1−p).
14 — free convex lattice monad
There's a common usage of the word "uncertainty", where the uncertainty is modulo strategic choice. For example, you might hear "Black is certain to win" from a chess commentator if Black can force a checkmate, or hear "the winner is still uncertain" from a poker commentator during the flop. By Myers' correspondence, this flavour of uncertainty — call it "ludic uncertainty" — must correspond to some monad, but which?
Consider the theory of convex lattices — with signature ΣG={∨,∧,0,1}∪{+p:p∈(0,1)} and the following axioms:
Then G:=L(ΣG,EG,−) is a monad corresponding, I think, to the aforementioned flavour of uncertainty. It sends a set X to the set G(X), the free convex lattices over X. An element of G(X) should be read as a game-tree whose non-leaf nodes are either a free binary choice by White, a free binary choice by Black, or a biased coin flip. The leaf nodes may be either wins for White, wins for Black, or an element of the set X.
We treat game-trees g,g′∈G(X) as equivalent if the same outcome would result from g and g′ regardless of the player's preferences over the elements of X. For example, the lattice axioms ϕ∨0=ϕ and ϕ∧1=ϕ will hold because no player would willingly choose to loose, and the axioms ϕ∨(ϕ∧ψ)=ϕ and ϕ∧(ϕ∨ψ)=ϕ establish that the players are adversarial, i.e. would never willingly empower one another.
Exercise 7: Consider the game ┌((1∨x2)∧((x2+0.80)∨(x2∧1)))∧(x2∨(x5+0.5(x3∧x4)))┐ shown below. Which outcome is (ludically) certain?
Note that G(X) aren't really games in the usual sense, because leaf nodes might be elements of X, and we treat these elements are pairwise incomparable to both players. So you should think of G(X) as a set of partially-specified game trees. A fully-specified game tree would be an element of G([0,1]), which is a game tree where each leaf-node returns some [0,1]-valued utility to Black and disutility to White. You may notice that [0,1] can itself be equipped with the structure of a convex lattice, which just means there exists a G-algebra V:G([0,1])→[0,1].[14] This G-algebra is exactly the well-known used in combinatorial game theory.
{∧,∨,0,1} is a lattice.
{+p:p∈(0,1)} is convex algebra.
+p distributes over both ∧ and ∨,
i.e. a+p(b∨c)=(a+pb)∨(a+pc)
┌x┐ is a game which will certainly result in outcome x∈X.
┌0┐ is a game where White wins and ┌1┐ is a game where Black wins.
┌ϕ∧ψ┐ is a game where White can choose to play ϕ or to play ψ.
┌ϕ∨ψ┐ is a game where Black can choose to play ϕ or to play ψ.
┌ϕ+pψ┐ is a game where ϕ is played with chance p and ψ with chance 1−p.
15 — infrabayesianism
When agents have beliefs about the same environment that they're embedded in, weird things can happen. Over the past few years, Vanessa Kosoy and Alex Appell have been exploring a novel flavour of uncertainty — infrabayesian uncertainty — which they claim more fruitfully characterises the belief-states of embedded agents. In particular, it characterises belief-states concerning Newcomb-like environments, where the state of the environment is correlated with the agent's choice under consideration. Their flavour of uncertainty corresponds to the infrabayesian monad □.
Roughly speaking, □ is the same as G above except without the ∧ connective. Consider (Σ,E) the theory of convex semilattices with top and bottom, which is a presentation of the composite monad P+f∘Δ∘(−+2).[15] From what I understand, this monad P+f∘Δ∘(−+2) is Kosoy's infrabayesian monad □.[16] This justifies the claim that infrabayesianism is the flavour of uncertainty that minimally encompasses both possibilistic uncertainty (via the P+f monad), probabilistic uncertainty (via the Δ monad), and reward (via the (−+{0,1}) monad). I think that this motivates infrabayesianism as a characterisation of an agent's belief-state about their environment.
∨ is a semilattice with a∨0=a and a∨1=1.
{+p:p∈(0,1)} is convex algebra.
+p distributes over ∨,
i.e. a+p(b∨c)=(a+pb)∨(a+pc)
┌x┐ is an environment which certainly results in outcome x∈X.
0 is an impossible/contradictory environment where the agent achieves no disutility, called Nirvana.
1 is an environment where the agent suffers maximal disutility.
(ϕ∨ψ) is a environment which is either like ϕ or like ψ, and our agent should be pessimistic here.
(ϕ+pψ) is an environment which is like ϕ with chance p and ψ with chance 1−p.
Unfortunately, □ isn't a commutative monad, which means it's not a flavour of uncertainty that you can have to parts of the world, but only to the world in its entirety. Put starkly, there's no way to combine my infrabayesian belief-states about two coin toss to yield a single infrabayesian belief-state about the pair of coin tosses, even when the coin tosses are completely unrelated.[17] This, I think, limits both the theoretical appeal of infrabayesianism and its tractability.
Theoretically speaking, the fact that □ isn't a commutative monad weakens the analogy between infrabayesian uncertainty and possibilistic or probabilistic uncertainty. Many concepts are built upon possibilistic or probabilistic uncertainty which appeal, in an essential way, to the product operators ⊗P+ or ⊗Δ. And infrabayesianism, lacking such an operator, is not guaranteed the analogous concept.
Practically speaking, the lack of an infrabayesian product operator is an obstacle to parallelising algorithms which assume infrabayesian belief-states. There is no way to decompose the environment into separate components, discover an infrabayesian belief-state for each component, and then combine those belief-states into a single belief-state about the environment as a whole.
Implications for AI safety
Does this essay have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this essay is maybe agent foundations foundations or something like that. But I feel compelled to offer some practical implications for AI safety to validate my decision to write this essay and your decision to read it.
Further questions
In so far as "flavours of uncertainty" is an informal term, there's little we can do to test the correspondence other than enumerating well-known flavours of uncertainty and checking that they do in fact correspond to monads, and vice-versa, enumerating the well-known monads and giving them natural doxastic interpretations. I think my own attempt has been positive, but this result is open to revision.
Secondly, the the biggest asterisks of my essay: my treatment of belief-states has been silent on their most important property, namely that they are learned. For example, a probability distribution can be conditioned on new evidence, and possibilistic uncertainty also carries an analogous notion of conditioning. Perhaps any characterisation of belief should answer additional questions about how those belief-state revised in light of new evidence/observations/considerations, etc. Perhaps we should append to Count? Certainty? Collapse? Combine? a fifth question, Condition? I'm sympathetic to this worry.
And if indeed learning is a phenomenon which must be modelled by any characterisation of belief, then monads do not themselves carry enough structure to characterise beliefs. Rather, we would need to equip the monad B with some additional structure, perhaps a family of maps learnS:O(S)×B(S)→B(S) for some spce of observations O(S), possibly satisfying some additional constraints such as learnS(o,−)∘ηS=ηS and learnS(o1⋅o2,−)=learnS(o2,−)∘learnS(o1,−). I'm just improvising here.
This is best left to future work, if the need arises.
In particular, I'm thinking of the applied category theory community.
Traditionally, the field of analytic epistemology has been concerned with defining epistemological concepts — i.e. constructing definitions for the concepts of knowledge, belief, evidence, learning, testimony, justification, etc. However, in recent years analytic epistemology has reorientated itself, chiefly under the influence of Timothy Williamson, towards modelling epistemological phenomena — i.e. constructing mathematical models for phenomena relating knowledge, belief, evidence, learning, testimony, justification, etc. This reorientation in epistemology, from concept-defining to model-building, was inspired by the natural sciences.
An operator F assigns, to every set X, another set/function F(X).
For example, P is the powerset operator, which assigns to every set X another set P(X). You can informally think of an operator as a function — but strictly speaking, an operator can't be a function because its domain would be the "set of all sets" (which doesn't exist).
Formally, the domain of an operator is something called a category. Categories can be larger than sets — in particular there is a category containing all the sets and the functions between them. For pedagogical purposes, I've framed everything in this article in terms of sets and functions, but most of the content of this article can applied to any category with enough structure.
And I suppose, by "generalising backwards", that my zeroth-order belief about the coin toss is the actual result of the coin toss..?
(M,e,⊙) is a monoid if (a⊙b)⊙c=a⊙(b⊙c) and e⊙a=a=a⊙e.
A monoid is like a group except the elements might not have inverses, e.g. (Z,0,+) is a group but (N,