Fair upfront warning: This is not a particularly readable proof section. There's a bunch of dense notation, logical leaps due to illusion of transparency since I've spent months getting fluent with these concepts, and a relative lack of editing since it's long. If you really want to read this, I'd suggest PM-ing me to get a link to MIRIxDiscord, where I'd be able to guide you through it and answer questions. This post will be recapping the notions and building up an arsenal of lemmas, the next one will show the isomorphism theorem, translation theorems, and behavior of mixing, and the last one is about updates and the decision-theory results. It's advised to have them open in different tabs and go between them as needed.
With that said, let's establish some notation and mental intuition. I'll err on the side of including more stuff because illusion of transparency. First, visualize the tree of alternating actions and observations in an environment. A full policy π can be viewed as that tree with some branches pruned off, specifying every history that's possible with your policy of interest. All our policies are deterministic. A policy stub πst is a policy tree that's been mostly pruned down (doesn't extend further than some finite time n). A partial policy πpa is just any policy tree in any state of specification or lack thereof, from tiny stubs to full policies to trees that are infinite down some branches but not others.
π∅ denotes the empty policy (a stub) which specifies nothing about what a policy does, and π¬h is some partial policy which specifies everything (acts like a full policy) everywhere except on history h and afterwards.
There's a distance metric on histories, as well as a distance metric on partial policies. Both of them are of the form γt where γ<1, and t is the "time of first difference". For histories, it's "what's the first time these histories differ", for policies, it's "what's the shortest time by which one partial policy is defined and the other is undefined, or where the policies differ on what to do". So, thinking of the distance as getting smaller as the time of first difference gets bigger is a reliable guide.
The outcome set F(πpa) is... take the tree corresponding to πpa, and it's the set of all the observation leaf nodes and infinite paths. No matter what, if you're interacting with an environment and acting according to πpa, the history you get is guaranteed to have, as a prefix, something in F(πpa). FNF(πpa) is that same set but minus all the Nirvana observations. Nirvana is a special observation which can occur at any time, counts as infinite reward, and ends the history. This is our compact metric space of interest that we're using to define a-measures and sa-measures. We assume that there's only finitely many discrete actions/observations available at any given point in time.
In this setting, sa-measures and a-measures over FNF(πpa) are defined as usual (a pair of a signed measure m and a number b where b+m−(1)≥0 for sa-measures, and a measure m with no negative parts and b≥0, respectively), because there's no infinite reward shenanigans. Sa-measures over F(πpa) require a technicality, though, which is that no nirvana event can have negative measure. λ will denote the total amount of measure you have. So, for a probability distribution, λ will be 1. We'll just use this for a-measures, and talk freely about the λ and b values of an a-measure. We use the KR-metric for measuring the distance between sa-measures (or a-measures), which is like "if two measures are really similar for a long time and then start diverging at late times, they're pretty similar." It's also equivalent to the earthmover distance, which is "how much effort does it take to rearrange the pile-of-dirt-that-is-this-measure into the pile-of-dirt-that-is-that-measure."
One important note. While m(f) is "what's the expectation of the continuous function f over histories, according to the measure we have", we frequently abuse notation and use m(h) to refer to what technically should be "what's the expectation of the indicator function for "this history has h as a prefix" w.r.t the measure". The reason we can do this is because the indicator function for the finite history h is a continuous function! So we can just view it as "what's the measure assigned to history h". Similarly, f★hg is the continuous function that's f on histories with h as a prefix and g on histories without h as a prefix.
For a given Ma(F(πpa)) or the nirvana-free variant, NF is just the subset of that where the measure components of the a-measures assign 0 measure to Nirvana occurring. They're safe from infinite reward. We suppress the dependency on πpa. Similarly,
EB(f)=inf(m,b)∈B∩NF(m(f)+b)
because if a Nirvana-containing measure was selected by Murphy, you'd get infinite expected value, so Murphy won't pick anything with Nirvana in it. Keep that in mind.
There's a fiddly thing to take into account about upper completion. We're usually working in the space of a-measures Ma(F(πpa)) or the nirvana-free equivalent. But the variant of upper completion we impose on our sets is: take the nirvana-free part of your set of interest, take the upper completion w.r.t the cone of nirvana-free sa-measures, then intersect with a-measures again. So, instead of the earlier setting where we could have any old sa-measure in our set and we could add any old sa-measure to them, now, since we're working purely in the space of a-measures and only demanding upper closure of the nirvana-free part, our notion of upper completion is something more like "start with a nirvana-free a-measure, you can add a nirvana-free sa-measure to it, and adding them has to make a nirvana-free a-measure"
Even worse, this is the notion of upper completion we impose, but for checking whether a point counts as minimal, we use the cone of sa-measures (with nirvana). So, for certifying that a point is non-minimal, we have to go "hey, there's another a-measure where we can add an sa-measure and make our point of interest". A different notion of upper completion here.
And, to make matters even worse, sometimes we do arguments involving the cone of sa-measures or nirvana-free sa-measures and don't impose the a-measure restriction. I'll try to clarify which case we're dealing with, but I can't guarantee it'll all be clear or sufficiently edited.
There's a partial ordering on partial policies, which is πpa≥π′pa if the two policies never disagree on which action to take, and πpa is defined on more histories than π′pa is (is a bigger tree). So, instead of viewing a partial policy as a tree, we can view the set of partial policies as a big poset. The full policies π are at the top, the empty policy π∅ is at the bottom. Along with this, we've got two important notions. One is the fundamental sequence of a partial policy. Envisioning it at the tree level, πnpa is "the tree that is πpa, just cut off at level n". Envisioning it at the poset level, the sequence πnpa is a chain of points in our poset ascending up to the point πpa.
Also synergizing with the partial-order view, we've got functions heading down the partial policy poset. prπhipa,πlopa∗ is the function that takes an a-measure or sa-measure over F(πhipa), and is like "ok, everything in F(πhipa) has a unique prefix in F(πlopa), push your measure component down, keep the b term the same". A good way of intuiting this is that this sort of projection describes what happens when you crunch down a measure over 10-bit-bitstrings to a measure over 8-bit-bitstrings. So view your poset of partial policies as being linked together by a bunch of projection arrows heading down.
There's a function Θ mapping each partial policy πpa to a set of a-measures over F(πpa) (or the nirvana-free variant), fulfilling some special properties. Maybe Θ is only defined over policy stubs or full policies, in which case we use Θst or Θω, respectively. So, the best mental visualization/sketching aid for a lot of the proofs is visualizing your partial policies of interest with an ordering on them where bigger=up and smaller=down, and have a set/point for each one, that organizes things fairly well and is how many of these proofs were created.
Every Θ (or the stub/full policy analogue) is associated with a λ⊙ and b⊙ value, which is "smallest upper bound on the λ of the minimal points of the Θ(πpa) sets" and "smallest upper bound on the b of the minimal points of the Θ(πpa) sets". Accordingly, the set {≤⊙} is defined as {(λμ,b)|λ+b≤λ⊙+b⊙}, and is a way of slicing out a bounded region of a set that contains all minimal points, if we need to do compactness arguments.
Finally, we'll reiterate two ultra-important results from basic inframeasure theory that get used a lot in here and will be tossed around casually for arguments without citing where they came from. There's the Compactness Lemma, which says that if you have a bound on the λ values and the b values of a closed set of a-measures, the set is compact. There's also Theorem 2, which says that you can break down any sa-measure into (minimal point + sa-measure), we use that decomposition a whole lot.
Other results we'll casually use are that projections commute (projecting down and then down again is the same as doing one big projection down), projections are linear (it doesn't matter whether you mix before or after projecting), projections don't expand distance (if two a-measures are ϵ apart before being projected down, they'll be ϵ or less apart after being projected down), if two a-measures are distinct inMa(F(πpa)), then the measure components differ at some finite time (or the b terms differ), so we can project down to some finite πnpa (same thing, just end history at time n) and they'll still be different, and projections preserve the λ and b value of an a-measure.
One last note, Ma(∞) is the space of a-measures on nirvana-free histories. This is all histories, not just the ones compatible with a specific policy. And a surmeasure SM is like a measure, but it can assign 0+ value to a nirvana event, marking it as "possible" even though it has 0 (arbitrarily low) measure.
Now, we can begin. Our first order of business is showing how the surtopology/surmetric/surmeasures are made and link together, but the bulk of this is the Isomorphism theorem. Which takes about 20 lemmas to set up first, in order to compress all the tools we need for it, and then the proof itself is extremely long. After that, things go a bit faster.
Lemma 1:d(x,x′):=max(d1(x,x′),d2(x,x′))is a metric if d1 is a metric and d2 is a pseudometric.
For identity of indiscernibles, d(x,x′)=0→d1(x,x′)=0→x=x′ because d1 is a metric, and in the reverse direction, if x=x′, then d2(x,x′)=0 (pseudometrics have 0 distance from a point to itself) and d1(x,x′)=0, so d(x,x′)=0.
For symmetry, both metrics and pseudometrics have symmetry, so
where γ<1, and t(m,m′) is the minimum length of a Nirvana-containing history that has positive measure according to m and 0 measure according to m′ (or vice-versa) We'll show that γt(m,m′) acts as a pseudometric, and then invoke Lemma 1.
The first three conditions of nonnegativity, γt(m,m)=γ∞=0, and symmetry are immediate. That just leaves checking the triangle inequality. Let t1:=t(m1,m2), t2:=t(m2,m3), and t3:=t(m1,m3).
Assume t3<min(t1,t2). Then, going from m1 to m2, all changes in the possibility of a Nirvana-history take place strictly aftert3, and going from m2 to m3, all changes in the possibility of a Nirvana-history also take place strictly after t3, so m1 and m3 behave identically (w.r.t. Nirvana-possibilities) up to and including time t3, which is impossible, because of t3 being "what's the shortest time where m1 and m3 disagree on the possiblility of a Nirvana-history". Therefore, t3≥min(t1,t2). and this case is ruled out.
In one case, either t1 or t2 are >t3. Without loss of generality, assume it's t1. Then, γt3<γt1≤γt1+γt2 and the triangle inequality is shown.
In the other case, t1=t2=t3. In that case, γt3=γt1≤γt1+γt2 And the triangle inequality is shown.
Therefore, γt(m,m′) is a pseudometric. Now, we can invoke Lemma 1 to show that ds is a metric.
Theorem 1:The surmetric on the space of sa-measures Ma(F(πpa)) induces the surtopology. The Cauchy completion of Ma(F(πpa)) w.r.t the surmetric is exactly the space of sa-surmeasures.
Proof sketch: First, use the metric to get an entourage (check the Wikipedia page on "Uniform Space"), and use the entourage to get a topology. Then, we go in both directions, and check that entourage-open sets are open according to the surtopology and the surtopology subbasis sets are entourage-open, to conclude that the topology induced by the surmetric is exactly the surtopology. Then, for the Cauchy completion, we'll show a bijection between equivalence classes of Cauchy sequences w.r.t. the surmetric and sa-surmeasures.
The surmetric is ds((m,b),(m′,b′)):=max(d(m,m′)+|b−b′|,γt(m,m′)) where γ<1, and t is the minimum length of a Nirvana-containing history that has positive measure according to m and 0 measure according to m′ (or vice-versa)
From the Wikipedia page on "Uniform Space", a fundamental system of entourages for Msa(F(πpa)) is given by
{(M,M′)∈Msa(F(πpa))×Msa(F(πpa)):ds(M,M′)≤ϵ}
A set O is open w.r.t. the uniformity iff for all M∈O, there exists an entourage V where V[M] lies entirely within O (wikipedia page). Because V is a subset of Ma(F(πpa))×Ma(F(πpa)), V[M] is the set of all the second components paired with a given sa-measure. So, let's say O is open w.r.t. the uniformity. Then, for all M∈O, there's an entourage V where V[M] lies entirely within O. A fundamental system of entourages has the property that every entourage is a superset of some set from the fundamental system. Thus, from our earlier definition of the fundamental system, there exists some ϵM where
{M′∈Msa(F(πpa)):ds(M,M′)≤ϵM}⊆O
We'll construct an open set from the surtopology that is a subset of this set and contains M, as follows. First, observe that if ds(M,M′)≤ϵM, then d(M,M′)≤ϵM, and γt(m,m′)≤ϵM. For the latter, there are finitely many nirvana-containing histories with a length less than logγ(ϵM)+1, and if a M′ matches M w.r.t. which nirvana-containing histories of that finite set are possible or impossible, then γt(m,m′)<ϵM (because M and M′ only differ on which Nirvana-histories are possible at very late times)
Accordingly, intersect the following sets:
1: the open ball centered at M with a size of ϵM
2: For all the nirvana-histories hN where |hN|≤logγ(ϵM)+1 and m(hN)>0, intersect all the sets of a-measures where that history has positive measure. These are open because they're the complements of "this finite history has zero measure", which is a closed set of sa-measures.
3: For all the nirvana-histories hN where |hN|≤logγ(ϵM)+1 and m(hN)=0, intersect all the sets of a-measures where that nirvana-history has 0 measure. These are open because they are subbasis sets for the surtopology.
We intersected finitely many open sets, so the result is open. Due to 2 and 3 and our earlier discussion, any M′ in the intersection must have γt(m,m′)<ϵM. Due to 1, d(M,M′)<ϵM.
This finite intersection of open sets (in the surtopology) produces an open set that contains M (obviously) and is a subset of {M′∈Msa(F(πpa)):ds(M,M′)≤ϵM}, which is a subset of V[M] which is a subset of O.
Because this argument can be applied to every point M∈O to get an open set (in the surtopology) that contains M and is a subset of O, we can make O itself by just unioning all our open sets together, which shows that O is open in the surtopology.
In the reverse direction, let's show that all sets in the subbasis of the surtopology are open w.r.t. the uniform structure.
First, we'll address the open balls around a point M. Every point M′ in such an open ball has some ϵM′-sized open ball which fits entirely within the original open ball. Then, we can just consider our entourage V being
{M,M′′∈Msa(F(πpa))×Msa(F(πpa)):ds(M,M′′)≤ϵM′2}
And then V[M′] is all points that are ϵM′2 or less away from M′ according to the surmetric, and ds(M′,M′′)≥d(M′,M′′) so this is a subset of the ϵM′-sized ball around M′, which is a subset of the ball around M. The extra measure we added in total on step n is (note that no nirvana-history can have a length of 0, so we start at 1, and t denotes timesteps in the history)
So, as n increases, the deviation of this sequence of sa-measures from the limit sa-surmeasure approaches 0 w.r.t. the usual metric, and every component in this sequence agrees with the others and the limit sa-surmeasure on which nirvana events are possible or impossible, so it's a Cauchy sequence limiting to the sa-surmeasure of interest.
Thus, all parts have been shown. The surmetric induces the surtopology, and the Cauchy completion of the sa-measures w.r.t. the surmetric is the set of sa-surmeasures. The same proof works if you want a-surmeasures (get it from the a-measures), or surmeasures (get it from the measures).
Alright, now we can begin the lemma setup for the Isomorphism Theorem, which is the Big One. See you again at Lemma 21.
Lemma 3:If B⊆Ma(FNF(πst)) and B is nonempty, closed, convex, nirvana-free upper-complete, and has bounded-minimals, then c.h(Bmin)=c.h(Bxmin)
So, first, Bxmin refers to the set of extreme minimal points of B. An extreme point of B is one that cannot be written as a mixture of other points in B.
Proof Sketch: One subset direction, c.h(Bmin)⊇c.h(Bxmin) is immediate. For the other direction, we need a way to write a minimal point as a finite mixture of extreme minimal points. What we do is first show that all extreme points in B must lie below the λ⊙+b⊙ bound by crafting a way to write them as a mix of different points with upper completion if they violate the bound. Then, we slice off the top of B to get a compact convex set with all the original minimal (and extreme) points in it. Since πst is a policy stub, FNF(πst) has finitely many possible outcomes, so we're working in a finite-dimensional vector space. In finite dimensions, a convex compact set is the convex hull of its extreme points, which are all either (extreme points in B originally), or (points on the high hyperplane we sliced at). Further, a minimal point can only be made by mixing together other minimal points. Putting this together, our minimal point of interest can be made by mixing together extreme minimal points, and the other subset direction is immediate from there.
Proof: As stated in the proof sketch, one subset direction is immediate, so we'll work on the other one. To begin with, fix a Mex that is extreme in B. It's an a-measure. If Mex has λ+b>λ⊙+b⊙, then it's not minimal (B has bounded-minimals) so we can decompose it into a minimal point Mmin respecting the bound and some nonzero sa-measure M∗. Mex=Mmin+M∗. Now, consider the point Mmin+(m∗−,−m∗−(1)) instead. We're adding on the negative part of m∗, and just enough of a b term to compensate, so it's an sa-measure. The sum of these two points is an a-measure, because we already know from Mex being an a-measure that the negative part of m∗ isn't enough to make any negative parts when we add it to mmin.
Anyways, summing the two parts like that saps a bit from the λ value of Mmin, but adds an equal amount on the b value, so it lies below the λ⊙+b⊙ "barrier", and by nirvana-free upper-completeness, it also lies in B. Then, we can express Mex as
Now, we already know that Mmin+(m∗−,−m∗−(1)) is an a-measure, and (m∗+,b∗+m∗−(1)) is an a-measure (no negative parts, end term is ≥0). So, we just wrote our extreme point as a mix of two distinct a-measures, so it's not extreme. Contradiction. Therefore, all extreme points have λ+b≤λ⊙+b⊙.
Let's resume. From bounded-minimals, we know that B has a suitable bound on λ+b, so the minimal points respect the λ⊙+b⊙ bound. Take B and chop it off at some high hyperplane, like λ+b≤2(λ⊙+b⊙). (the constant 2 isn't that important, it just has to be above 1 so we net all the extreme points and minimal points). Call the resulting set C. Due to the bound, and B being closed, C is compact by the Compactness Lemma. It's also convex.
Now, invoke the Krein-Milman theorem (and also, we're in a finite-dimensional space since we're working with a stub, finitely many observation leaf nodes, so we don't need to close afterwards, check the Wikipedia page for Krein-Milman Theorem at the bottom) to go C=c.h(extreme(C)). The only extreme points in C are either points that were originally extreme in B, or points on the high hyperplane that we chopped at.
Fix someMmin∈Bmin. Bmin⊆C, so Mmin∈C. Thus, Mmin can be written as a finite mixture of points from extreme(C). However, because Mmin is minimal, it can only be a mixture of minimal points, as we will show now.
Decompose Mmin into EζMi, and then decompose the Mi into Mmini+M∗i . To derive a contradiction, assume there exists some i′ where Mi′ isn't minimal, so that M∗i′ isn't 0. Then,
Mmin=EζMi=Eζ(Mmini+M∗i)=Eζ(Mmini)+Eζ(M∗i)
Thus, we have decomposed our minimal point into another point which is also present in B, and a nonzero sa-measure because M∗i′ is nonzero, so our original minimal point is actually nonminimal. and we have a contradiction. Therefore, all decompositions of a minimal point into a mix of points must have every component point being minimal as well.
So, when we decomposed Mmin into a mix of points in extreme(C), all the extreme points we decomposed it into are minimal, so there's no component on the high hyperplane. Mmin was arbitrary in Bmin establishing that Bmin⊆c.h(Bxmin). Therefore, c.h(Bmin)⊆c.h(Bxmin)
So we have our desired result.
Lemma 4:If πpa≥πhist≥πlost, and A⊆Ma(F(πhist)) and B⊆Ma(F(πlost)) (also works with the nirvana-free variants) and prπhist,πlost∗(A)⊆B then (prπpa,πhist∗)−1(A)⊆(prπpa,πlost∗)−1(B)This works for surmeasures too.
A point M in the preimage of A has prπpa,πhist∗(M)∈A, and by projections commuting and projecting down further landing you in B, we get prπpa,πlost∗(M)∈B, so M is in the preimage of B too.
Lemma 5:Given a partial policy πpa and stub πst, if πpa≥πst, then ∃n:πnpa≥πst
πst is a stub that specifies less about what the policy does than πpa, and because it's a stub it has a minimum time beyond which it's guaranteed to be undefined, so just let that be your n. πnpa then specifies everything that πst does, and maybe more, because it has all the data of πpa up till time n.
Lemma 6:If πpa is a partial policy, and∀πlost,πhist≥πlost:prπhist,πlost∗(Θst(πhist))⊆Θst(πlost)holds, then, for all m, ⋂πst≤πpa(prπpa,πst∗)−1(Θst(πst))=⋂n≥m(prπpa,πnpa∗)−1(Θst(πnpa))This works for surmeasures too.
First, all the πnpa≤πpa are stubs, so we get one subset direction immediately.
Due to being able to take any stub preimage and find a smaller preimage amongst the fundamental sequence for πpa (with an initial segment clipped off) we don't need anything other than the preimages of the fundamental sequence (with an initial segment clipped off), which establishes the other direction and thus our result.
Lemma 7:If M is an a-measure and prπpa,πst∗(M)=M′ and M′=M′lo+M′∗ and M′lo is an a-measure, then there exists a Mlo∈Ma(F(πpa)) (or the nirvana-free variant) s.t. prπpa,πst∗(Mlo)=M′lo and there's an sa-measure M∗ s.t. Mlo+M∗=M. This works for a-surmeasures and sa-surmeasures too.
What this essentially says is "let's say we start with a M and project it down to M′, and then find a point M′lo below M′. Can we "go back up" and view M′lo as the projection of some point below M? Yes". It's advised to sketch out the setup of this one, if not the proof itself.
Proof sketch: To build our Mlo and M∗, the b components are preserved, but crafting the measure component for them is tricky. They've gotta project down to M′lo and M′∗ so those two give us our base case to start working from with the measures (and automatically get the "must project down appropriately" requirement) and then we can recursively build up by extending both of them with the conditional probabilities that M gives us. However, we must be wary of division-by-zero errors and accidentally assigning negative measure on Nirvana, which complicates things considerably. Once we've shown how to recursively build up the measure components of our Mlo and M∗, we then need to check four things. That they're both well formed (sum of measure on 1-step extensions of a history=measure on the history, no semimeasures here), that they sum up to make M, the measure component of Mlo can't go negative anywhere (to certify that it's an a-measure), and that the b term attached to M∗ is big enough to cancel out the negative regions (to certify that it's an sa-measure).
Proof: Let Mlo=(mlo,blo) where blo is the b term of M′lo. Let M∗=(m∗,b∗) where b∗ is the b term of M′∗. Recursively define mlo and m∗ on h that are prefixes of something in F(πpa) (or the nirvana-free variant) as follows:
If h is a prefix of something in F(πst) (or the nirvana-free variant), mlo(h)=m′lo(h) and m∗(h)=m′∗(h). That defines our base case. Now for how to inductively build up by mutual recursion. Let's use haN for a nirvana-history and hao for a non-nirvana history.
If m∗(h)<0, then
mlo(haN)=m(haN),mlo(hao)=m(hao)−m∗(h)#o
m∗(haN)=0,m∗(hao)=m∗(h)#o
#o is the number of non-nirvana observations that can come after ha.
If m∗(h)≥0 and m(h)>0, then
mlo(hao)=m(hao)m(h)mlo(h),m∗(hao)=m(hao)m(h)m∗(h)
and the same holds for defining mlo(haN) and m∗(haN).
If m∗(h)≥0 and m(h)=0, then mlo(hao)=m∗(hao)=0
We need to verify that these sum up to m, that they're both well-formed signed measures, that mlo has no negative parts, and that the b value for M∗ is big enough. mlo having no negative parts is immediate by the way we defined it, because it's nonnegative on all the base cases since m′lo came from an a-measure, and m came from an a-measure as well which lets you use induction to transfer that property all the way up the histories.
To verify that they sum up to m, observe that for base-case histories in F(πst),
mlo(h)+m∗(h)=m′lo(h)+m′∗(h)=m′(h)=m(h)
For non-base-case histories hao we can use induction (assume it's true for h) and go:
Zero case: m(hao)=0 because m(h)=0 and m came from an a-measure and has no negative parts. mlo(hao)+m∗(hao)=0+0=0=m(hao)
Ok, so we've shown that mlo+m∗=m.
What about checking that they're well-formed signed measures? To do this, it suffices to check that summing up their measure-mass over haoi gets the measure-mass over h. This works over the base case, so we just have to check the induction steps.
In the zero case where m(h)=0, we need to show that mlo(h) and m∗(h) will also be zero. Winding h back, there's some longest prefix h′ where m(h′)>0. Now, knowing that m(h′)=mlo(h′)+m∗(h′), we have two possible options here.
In the first case, m∗(h′)≥0, so mlo(h′ao) (advancing one step) is:
mlo(h′ao)=m(h′ao)m(h′)mlo(h′)=0m(h′)mlo(h′)=0
And similar for m∗(h′ao), so they're both 0, along with m′, on h′ao, and then the zero case transfers the "they're both zero" property all the way up to h.
In the second case, m∗(h′)<0 and mlo(h′)>0. Then, proceeding forward, m∗(h′ao)<0, and this keeps holding all the way up to h, so we're actually in the negative case, not the zero case.
So, if m(h)=0, then mlo(h)=m∗(h)=0 as long as we're in the case where m∗(h)≥0 and m(h)=0. Then, it's easy. mlo(haN)+∑imlo(haoi)=0=mlo(h) and the same for m∗.
Also, m∗, by the way we defined it, never puts negative measure on a nirvana event, so we're good there, they're both well-formed signed measures. For the b∗ value being sufficient to compensate for the negative-measure of m∗, observe that the way we did the extension, the negative-measure for m∗ is the same as the negative measure for m′∗, and b∗=b′∗, and the latter is sufficient to cancel out the negative measure for m′∗, so we're good there.
We're done now, and this can be extended to a-surmeasures by taking the 0+ nirvana-events in m and saying that all those nirvana-events have 0+ measure in mlo.
Lemma 8:Having a λ+b bound on a set of a-surmeasures is sufficient to ensure that it's contained in a compact set w.r.t the surtopology.
This is the analogue of the Compactness Lemma for the sur-case. We'll keep it in the background instead of explicitly invoking it each time we go "there's a bound, therefore compactness". It's important.
Proof sketch: Given a sequence, the bound gets convergence of the measure part by the Compactness Lemma, and then we use Tychonoff to show that we can get a subsequence where the a-surmeasures start agreeing on which nirvana events are possible or impossible, for all nirvana events, so their first time of disagreement gets pushed arbitrarily far out, forcing convergence w.r.t. the surmetric.
Proof: given a sequence of a-surmeasures SMn, and rounding them off to their "standard" part (slicing off the 0+ probability), we can get a convergent subsequence, where the measure part and b part converges, by the Compactness Lemma since we have a λ+b bound, which translates into bounds on λ and b.
Now, all we need is a subsequence of that subsequence that ensures that, for each nirvana-event, the sequence of a-surmeasures starts agreeing on whether it's possible or impossible. There are countably many finite histories, and each nirvana-history is a finite history, so we index our nirvana events by natural numbers, and we can view our sequence as wandering around within {0,1}ω, where the t'th coordinate keeps track of whether the t'th nirvana event is possible or impossible. {0,1}ω is compact by Tychonoff's Theorem, so we can find a convergent subsequence, which corresponds to a sequence of a-surmeasures that, for any nirvana event, eventually start agreeing on whether it's possible or impossible. There's finitely many nirvana events before a certain finite time, so if we go far enough out in the n, the a-surmeasures agree on what nirvana events are possible or impossible for a very long time, and so the surdistance shrinks to 0 and they converge, establishing that all sequences have a convergent subsequence, so the set is compact.
Lemma 9:Given a πpa and a sequence of nonempty compact sets Bn∈Ma(F(πnpa)) (or the nirvana-free variant) where ∀n:prπn+1pa,πnpa∗(Bn+1)⊆Bn then there is a point M∈Ma(F(πpa)) (or the nirvana-free variant) where ∀n:prπpa,πnpa∗(M)∈Bn. This also works with a-surmeasures.
Sketch this one out. It's essentially "if sets get smaller and smaller, but not empty, as you ascend up the chain πnpa towards πpa, and are nested in each other, then there's something at the πpa level that projects down into all the πnpa"
Proof sketch: Projection preserves λ and b, the Compactness Lemma says that compactness means you have a λ and b bound, so the preimage of a compact set is compact. Then, we just have to verify the finite intersection property to show that the intersection of the preimages is nonempty, which is pretty easy since all our preimages are nested in each other like an infinite onion.
Proof: Consider the intersection ⋂n(prπpa,πnpa∗)−1(Bn). Because Bn are all compact, they have a λ and b bound. Projection preserves the λ and b values, so the preimage of Bn has a λ and b bound, therefore lies in a compact set (By Lemma 8 for the sur-case). The preimage of a closed set is also closed set, so all these preimages are compact. This is then an intersection of a family of compact sets, so we just need to check the nonempty intersection property. Fixing finitely many m, we can find an n above them all, and pick an arbitrary point in the preimage of Bn, and invoke Lemma 4 on Bn to conclude that said point lies in all lower preimages, thus demonstrating finite intersection. Therefore, the intersection is nonempty.
Lemma 10: Given a sequence of nonempty closed sets Bn where prπn+1pa,πnpa∗(Bn+1)⊆Bn, and a sequence of points Mn∈(prπpa,πnpa∗)−1(Bn), all limit points of the sequence Mn (if they exist) lie in ⋂n(prπpa,πnpa∗)−1(Bn) (works in the a-surmeasure case)
Proof: Assume a limit point exists, isolate a subsequence limiting to it. By Lemma 4, the preimages are nested in each other. Also, the preimage of a closed set is closed. Thus, for our subsequence, past n, the points are in the preimage of Bn and don't ever leave, so the limit point is in the preimage of Bn. This argument works for all n, so the limit point is in the intersection of the preimages.
The next three Lemmas are typically used in close succession to establish nirvana-free upper-completeness for projecting down a bunch of nirvana-free upper complete sets, and taking the closed convex hull of them, which is an operation we use a lot. The first one says that projecting down a nirvana-free upper-complete set is upper-complete, the second one says that convex hull preserves the property, and the third one says that closure preserves the property. The first one requires building up a suitable measure via recursion on conditional probabilities, the second one requires building up a whole bunch of sa-measures via recursion on conditional probabilities and taking limits of them to get suitable stuff to mix together, and the third one also requires building up a whole bunch of sa-measures via recursion on conditional probabilities and then some fanciness with defining a limiting sequence.
Lemma 11:In the nirvana-free setting, a projection of an upper-complete set is upper-complete.
Proof sketch: To be precise about exactly what this says, since we're working with a-measures, it says "if you take the fragment of the upper completion composed of a-measures, and project it down, then the thing you get is: the fragment of the upper completion of (projected set) composed of a-measures". Basically, since we're not working in the full space of sa-measures, and just looking at the a-measure part of the upper completion, that's what makes this one tricky and not immediate.
The proof path is: Take an arbitrary point Mlo in the projection of B which has been crafted by projecting down Mhi. Given an arbitrary M′lo:=Mlo+M∗lo (assuming it's an a-measure) which lies in the upper completion of the projection of B, we need to certify that it's in the projection of B to show that B is upper-complete. In order to do this, we craft a M∗hi and M′hi (an a-measure) s.t: Mhi+M∗hi=M′hi (certifying that M′hi is in B since B is upper complete), and M′hi projects down to hit our M′lo point of interest.
These a-measures are crafted by starting with the base case of M∗lo and M′lo, and recursively building up the conditional probabilities in accordance with the conditional probabilities of Mhi. Then we just verify the basic conditions like the measures being well-formed, M′hi being an a-measure, M∗hi having a big enough b term, and Mhi+M∗hi=M′hi, to get our result. Working in the Nirvana-free case is nice since we don't need to worry about assigning negative measure to Nirvana.
Proof: Let B⊆Ma(FNF(πhipa)) be our upper-complete set. We want to show that prπhipa,πlopa∗(B) is upper-complete. To that end, fix a Mlo in the projection of B that's the projection of a Mhi∈B. Let M′lo:=Mlo+M∗lo, where M′lo is an a-measure. Can we find an a-measure M′hi in B that projects down to M′lo? Let's define M∗hi and M′hi as follows:
Let M′hi=(m′hi,b′hi) where b′hi is b′lo. Let M∗hi=(m∗hi,b∗hi) where b∗hi is b∗lo. Recursively define m′hi and m∗hi on h that are prefixes of something in FNF(πhipa) as follows:
If h is a prefix of something in F(πlopa), m′hi(h)=m′lo(h) and m∗hi(h)=m∗lo(h).
Otherwise, recursively define the measure components m′hi and m∗hi as:
We need to verify that m′hi has no negative parts so it's fitting for an a-measure, that mhi+m∗hi=m′hi, that the b value for sm∗hi works, and that they're both well-formed signed measures. The first part is easy to establish, m′hi(h)=m′lo(h)≥0 in the base cases since M′lo is an a-measure and a quick induction as well as mhi coming from the a-measure Mhi (so no negatives anywhere) establish m′hi as not having any negative parts.
To verify that they sum up to m′hi, observe that for base-case histories in FNF(πlopa), mhi(h)+m∗hi(h)=mlo(h)+m∗lo(h)=m′lo(h)=m′hi(h). Then, in the general case, we can use induction (assume it's true for h) and go:
What about checking that they're well-formed measures? To do this, it suffices to check that summing up their measure-mass over haoi gets the measure-mass over h. If mhi(
Fair upfront warning: This is not a particularly readable proof section. There's a bunch of dense notation, logical leaps due to illusion of transparency since I've spent months getting fluent with these concepts, and a relative lack of editing since it's long. If you really want to read this, I'd suggest PM-ing me to get a link to MIRIxDiscord, where I'd be able to guide you through it and answer questions. This post will be recapping the notions and building up an arsenal of lemmas, the next one will show the isomorphism theorem, translation theorems, and behavior of mixing, and the last one is about updates and the decision-theory results. It's advised to have them open in different tabs and go between them as needed.
With that said, let's establish some notation and mental intuition. I'll err on the side of including more stuff because illusion of transparency. First, visualize the tree of alternating actions and observations in an environment. A full policy π can be viewed as that tree with some branches pruned off, specifying every history that's possible with your policy of interest. All our policies are deterministic. A policy stub πst is a policy tree that's been mostly pruned down (doesn't extend further than some finite time n). A partial policy πpa is just any policy tree in any state of specification or lack thereof, from tiny stubs to full policies to trees that are infinite down some branches but not others.
π∅ denotes the empty policy (a stub) which specifies nothing about what a policy does, and π¬h is some partial policy which specifies everything (acts like a full policy) everywhere except on history h and afterwards.
There's a distance metric on histories, as well as a distance metric on partial policies. Both of them are of the form γt where γ<1, and t is the "time of first difference". For histories, it's "what's the first time these histories differ", for policies, it's "what's the shortest time by which one partial policy is defined and the other is undefined, or where the policies differ on what to do". So, thinking of the distance as getting smaller as the time of first difference gets bigger is a reliable guide.
The outcome set F(πpa) is... take the tree corresponding to πpa, and it's the set of all the observation leaf nodes and infinite paths. No matter what, if you're interacting with an environment and acting according to πpa, the history you get is guaranteed to have, as a prefix, something in F(πpa). FNF(πpa) is that same set but minus all the Nirvana observations. Nirvana is a special observation which can occur at any time, counts as infinite reward, and ends the history. This is our compact metric space of interest that we're using to define a-measures and sa-measures. We assume that there's only finitely many discrete actions/observations available at any given point in time.
In this setting, sa-measures and a-measures over FNF(πpa) are defined as usual (a pair of a signed measure m and a number b where b+m−(1)≥0 for sa-measures, and a measure m with no negative parts and b≥0, respectively), because there's no infinite reward shenanigans. Sa-measures over F(πpa) require a technicality, though, which is that no nirvana event can have negative measure. λ will denote the total amount of measure you have. So, for a probability distribution, λ will be 1. We'll just use this for a-measures, and talk freely about the λ and b values of an a-measure. We use the KR-metric for measuring the distance between sa-measures (or a-measures), which is like "if two measures are really similar for a long time and then start diverging at late times, they're pretty similar." It's also equivalent to the earthmover distance, which is "how much effort does it take to rearrange the pile-of-dirt-that-is-this-measure into the pile-of-dirt-that-is-that-measure."
One important note. While m(f) is "what's the expectation of the continuous function f over histories, according to the measure we have", we frequently abuse notation and use m(h) to refer to what technically should be "what's the expectation of the indicator function for "this history has h as a prefix" w.r.t the measure". The reason we can do this is because the indicator function for the finite history h is a continuous function! So we can just view it as "what's the measure assigned to history h". Similarly, f★hg is the continuous function that's f on histories with h as a prefix and g on histories without h as a prefix.
For a given Ma(F(πpa)) or the nirvana-free variant, NF is just the subset of that where the measure components of the a-measures assign 0 measure to Nirvana occurring. They're safe from infinite reward. We suppress the dependency on πpa. Similarly,
EB(f)=inf(m,b)∈B∩NF(m(f)+b)
because if a Nirvana-containing measure was selected by Murphy, you'd get infinite expected value, so Murphy won't pick anything with Nirvana in it. Keep that in mind.
There's a fiddly thing to take into account about upper completion. We're usually working in the space of a-measures Ma(F(πpa)) or the nirvana-free equivalent. But the variant of upper completion we impose on our sets is: take the nirvana-free part of your set of interest, take the upper completion w.r.t the cone of nirvana-free sa-measures, then intersect with a-measures again. So, instead of the earlier setting where we could have any old sa-measure in our set and we could add any old sa-measure to them, now, since we're working purely in the space of a-measures and only demanding upper closure of the nirvana-free part, our notion of upper completion is something more like "start with a nirvana-free a-measure, you can add a nirvana-free sa-measure to it, and adding them has to make a nirvana-free a-measure"
Even worse, this is the notion of upper completion we impose, but for checking whether a point counts as minimal, we use the cone of sa-measures (with nirvana). So, for certifying that a point is non-minimal, we have to go "hey, there's another a-measure where we can add an sa-measure and make our point of interest". A different notion of upper completion here.
And, to make matters even worse, sometimes we do arguments involving the cone of sa-measures or nirvana-free sa-measures and don't impose the a-measure restriction. I'll try to clarify which case we're dealing with, but I can't guarantee it'll all be clear or sufficiently edited.
There's a partial ordering on partial policies, which is πpa≥π′pa if the two policies never disagree on which action to take, and πpa is defined on more histories than π′pa is (is a bigger tree). So, instead of viewing a partial policy as a tree, we can view the set of partial policies as a big poset. The full policies π are at the top, the empty policy π∅ is at the bottom. Along with this, we've got two important notions. One is the fundamental sequence of a partial policy. Envisioning it at the tree level, πnpa is "the tree that is πpa, just cut off at level n". Envisioning it at the poset level, the sequence πnpa is a chain of points in our poset ascending up to the point πpa.
Also synergizing with the partial-order view, we've got functions heading down the partial policy poset. prπhipa,πlopa∗ is the function that takes an a-measure or sa-measure over F(πhipa), and is like "ok, everything in F(πhipa) has a unique prefix in F(πlopa), push your measure component down, keep the b term the same". A good way of intuiting this is that this sort of projection describes what happens when you crunch down a measure over 10-bit-bitstrings to a measure over 8-bit-bitstrings. So view your poset of partial policies as being linked together by a bunch of projection arrows heading down.
There's a function Θ mapping each partial policy πpa to a set of a-measures over F(πpa) (or the nirvana-free variant), fulfilling some special properties. Maybe Θ is only defined over policy stubs or full policies, in which case we use Θst or Θω, respectively. So, the best mental visualization/sketching aid for a lot of the proofs is visualizing your partial policies of interest with an ordering on them where bigger=up and smaller=down, and have a set/point for each one, that organizes things fairly well and is how many of these proofs were created.
Every Θ (or the stub/full policy analogue) is associated with a λ⊙ and b⊙ value, which is "smallest upper bound on the λ of the minimal points of the Θ(πpa) sets" and "smallest upper bound on the b of the minimal points of the Θ(πpa) sets". Accordingly, the set {≤⊙} is defined as {(λμ,b)|λ+b≤λ⊙+b⊙}, and is a way of slicing out a bounded region of a set that contains all minimal points, if we need to do compactness arguments.
Finally, we'll reiterate two ultra-important results from basic inframeasure theory that get used a lot in here and will be tossed around casually for arguments without citing where they came from. There's the Compactness Lemma, which says that if you have a bound on the λ values and the b values of a closed set of a-measures, the set is compact. There's also Theorem 2, which says that you can break down any sa-measure into (minimal point + sa-measure), we use that decomposition a whole lot.
Other results we'll casually use are that projections commute (projecting down and then down again is the same as doing one big projection down), projections are linear (it doesn't matter whether you mix before or after projecting), projections don't expand distance (if two a-measures are ϵ apart before being projected down, they'll be ϵ or less apart after being projected down), if two a-measures are distinct inMa(F(πpa)), then the measure components differ at some finite time (or the b terms differ), so we can project down to some finite πnpa (same thing, just end history at time n) and they'll still be different, and projections preserve the λ and b value of an a-measure.
One last note, Ma(∞) is the space of a-measures on nirvana-free histories. This is all histories, not just the ones compatible with a specific policy. And a surmeasure SM is like a measure, but it can assign 0+ value to a nirvana event, marking it as "possible" even though it has 0 (arbitrarily low) measure.
Now, we can begin. Our first order of business is showing how the surtopology/surmetric/surmeasures are made and link together, but the bulk of this is the Isomorphism theorem. Which takes about 20 lemmas to set up first, in order to compress all the tools we need for it, and then the proof itself is extremely long. After that, things go a bit faster.
Lemma 1: d(x,x′):=max(d1(x,x′),d2(x,x′)) is a metric if d1 is a metric and d2 is a pseudometric.
For identity of indiscernibles, d(x,x′)=0→d1(x,x′)=0→x=x′ because d1 is a metric, and in the reverse direction, if x=x′, then d2(x,x′)=0 (pseudometrics have 0 distance from a point to itself) and d1(x,x′)=0, so d(x,x′)=0.
For symmetry, both metrics and pseudometrics have symmetry, so
d(x,x′)=max(d1(x,x′),d2(x,x′))=max(d1(x′,x),d2(x′,x))=d(x′,x)
For triangle inequality, both metrics and pseudometrics fulfill the triangle inequality, so
d(x,z)=max(d1(x,z)+d2(x,z))≤max(d1(x,y)+d1(y,z),d2(x,y)+d2(y,z))
≤max(d1(x,y),d2(x,y))+max(d1(y,z),d2(y,z))=d(x,y)+d(y,z)
And we're done.
Lemma 2: The surmetric is a metric.
To recap, the surmetric over sa-measures is
ds((m,b),(m′,b′)):=max(d(m,m′)+|b−b′|,γt(m,m′))
where γ<1, and t(m,m′) is the minimum length of a Nirvana-containing history that has positive measure according to m and 0 measure according to m′ (or vice-versa) We'll show that γt(m,m′) acts as a pseudometric, and then invoke Lemma 1.
The first three conditions of nonnegativity, γt(m,m)=γ∞=0, and symmetry are immediate. That just leaves checking the triangle inequality. Let t1:=t(m1,m2), t2:=t(m2,m3), and t3:=t(m1,m3).
Assume t3<min(t1,t2). Then, going from m1 to m2, all changes in the possibility of a Nirvana-history take place strictly after t3, and going from m2 to m3, all changes in the possibility of a Nirvana-history also take place strictly after t3, so m1 and m3 behave identically (w.r.t. Nirvana-possibilities) up to and including time t3, which is impossible, because of t3 being "what's the shortest time where m1 and m3 disagree on the possiblility of a Nirvana-history".
Therefore, t3≥min(t1,t2). and this case is ruled out.
In one case, either t1 or t2 are >t3. Without loss of generality, assume it's t1. Then, γt3<γt1≤γt1+γt2 and the triangle inequality is shown.
In the other case, t1=t2=t3. In that case, γt3=γt1≤γt1+γt2 And the triangle inequality is shown.
Therefore, γt(m,m′) is a pseudometric. Now, we can invoke Lemma 1 to show that ds is a metric.
Theorem 1: The surmetric on the space of sa-measures Ma(F(πpa)) induces the surtopology. The Cauchy completion of Ma(F(πpa)) w.r.t the surmetric is exactly the space of sa-surmeasures.
Proof sketch: First, use the metric to get an entourage (check the Wikipedia page on "Uniform Space"), and use the entourage to get a topology. Then, we go in both directions, and check that entourage-open sets are open according to the surtopology and the surtopology subbasis sets are entourage-open, to conclude that the topology induced by the surmetric is exactly the surtopology. Then, for the Cauchy completion, we'll show a bijection between equivalence classes of Cauchy sequences w.r.t. the surmetric and sa-surmeasures.
The surmetric is ds((m,b),(m′,b′)):=max(d(m,m′)+|b−b′|,γt(m,m′)) where γ<1, and t is the minimum length of a Nirvana-containing history that has positive measure according to m and 0 measure according to m′ (or vice-versa)
From the Wikipedia page on "Uniform Space", a fundamental system of entourages for Msa(F(πpa)) is given by
{(M,M′)∈Msa(F(πpa))×Msa(F(πpa)):ds(M,M′)≤ϵ}
A set O is open w.r.t. the uniformity iff for all M∈O, there exists an entourage V where V[M] lies entirely within O (wikipedia page). Because V is a subset of Ma(F(πpa))×Ma(F(πpa)), V[M] is the set of all the second components paired with a given sa-measure.
So, let's say O is open w.r.t. the uniformity. Then, for all M∈O, there's an entourage V where V[M] lies entirely within O. A fundamental system of entourages has the property that every entourage is a superset of some set from the fundamental system. Thus, from our earlier definition of the fundamental system, there exists some ϵM where
{M′∈Msa(F(πpa)):ds(M,M′)≤ϵM}⊆O
We'll construct an open set from the surtopology that is a subset of this set and contains M, as follows. First, observe that if ds(M,M′)≤ϵM, then d(M,M′)≤ϵM, and γt(m,m′)≤ϵM. For the latter, there are finitely many nirvana-containing histories with a length less than logγ(ϵM)+1, and if a M′ matches M w.r.t. which nirvana-containing histories of that finite set are possible or impossible, then γt(m,m′)<ϵM (because M and M′ only differ on which Nirvana-histories are possible at very late times)
Accordingly, intersect the following sets:
1: the open ball centered at M with a size of ϵM
2: For all the nirvana-histories hN where |hN|≤logγ(ϵM)+1 and m(hN)>0, intersect all the sets of a-measures where that history has positive measure. These are open because they're the complements of "this finite history has zero measure", which is a closed set of sa-measures.
3: For all the nirvana-histories hN where |hN|≤logγ(ϵM)+1 and m(hN)=0, intersect all the sets of a-measures where that nirvana-history has 0 measure. These are open because they are subbasis sets for the surtopology.
We intersected finitely many open sets, so the result is open. Due to 2 and 3 and our earlier discussion, any M′ in the intersection must have γt(m,m′)<ϵM. Due to 1, d(M,M′)<ϵM.
This finite intersection of open sets (in the surtopology) produces an open set that contains M (obviously) and is a subset of {M′∈Msa(F(πpa)):ds(M,M′)≤ϵM}, which is a subset of V[M] which is a subset of O.
Because this argument can be applied to every point M∈O to get an open set (in the surtopology) that contains M and is a subset of O, we can make O itself by just unioning all our open sets together, which shows that O is open in the surtopology.
In the reverse direction, let's show that all sets in the subbasis of the surtopology are open w.r.t. the uniform structure.
First, we'll address the open balls around a point M. Every point M′ in such an open ball has some ϵM′-sized open ball which fits entirely within the original open ball. Then, we can just consider our entourage V being
{M,M′′∈Msa(F(πpa))×Msa(F(πpa)):ds(M,M′′)≤ϵM′2}
And then V[M′] is all points that are ϵM′2 or less away from M′ according to the surmetric, and ds(M′,M′′)≥d(M′,M′′) so this is a subset of the ϵM′-sized ball around M′, which is a subset of the ball around M. The extra measure we added in total on step n is (note that no nirvana-history can have a length of 0, so we start at 1, and t denotes timesteps in the history)
∑t∑hN:|hN|=t2−(n+t)#(t)=∑t#(t)2−(n+t)#(t)≤∑t2−(n+t)=2−n∑t2−t≤2−n
So, as n increases, the deviation of this sequence of sa-measures from the limit sa-surmeasure approaches 0 w.r.t. the usual metric, and every component in this sequence agrees with the others and the limit sa-surmeasure on which nirvana events are possible or impossible, so it's a Cauchy sequence limiting to the sa-surmeasure of interest.
Thus, all parts have been shown. The surmetric induces the surtopology, and the Cauchy completion of the sa-measures w.r.t. the surmetric is the set of sa-surmeasures. The same proof works if you want a-surmeasures (get it from the a-measures), or surmeasures (get it from the measures).
Alright, now we can begin the lemma setup for the Isomorphism Theorem, which is the Big One. See you again at Lemma 21.
Lemma 3: If B⊆Ma(FNF(πst)) and B is nonempty, closed, convex, nirvana-free upper-complete, and has bounded-minimals, then c.h(Bmin)=c.h(Bxmin)
So, first, Bxmin refers to the set of extreme minimal points of B. An extreme point of B is one that cannot be written as a mixture of other points in B.
Proof Sketch: One subset direction, c.h(Bmin)⊇c.h(Bxmin) is immediate. For the other direction, we need a way to write a minimal point as a finite mixture of extreme minimal points. What we do is first show that all extreme points in B must lie below the λ⊙+b⊙ bound by crafting a way to write them as a mix of different points with upper completion if they violate the bound. Then, we slice off the top of B to get a compact convex set with all the original minimal (and extreme) points in it. Since πst is a policy stub, FNF(πst) has finitely many possible outcomes, so we're working in a finite-dimensional vector space. In finite dimensions, a convex compact set is the convex hull of its extreme points, which are all either (extreme points in B originally), or (points on the high hyperplane we sliced at). Further, a minimal point can only be made by mixing together other minimal points. Putting this together, our minimal point of interest can be made by mixing together extreme minimal points, and the other subset direction is immediate from there.
Proof: As stated in the proof sketch, one subset direction is immediate, so we'll work on the other one. To begin with, fix a Mex that is extreme in B. It's an a-measure. If Mex has λ+b>λ⊙+b⊙, then it's not minimal (B has bounded-minimals) so we can decompose it into a minimal point Mmin respecting the bound and some nonzero sa-measure M∗. Mex=Mmin+M∗. Now, consider the point Mmin+(m∗−,−m∗−(1)) instead. We're adding on the negative part of m∗, and just enough of a b term to compensate, so it's an sa-measure. The sum of these two points is an a-measure, because we already know from Mex being an a-measure that the negative part of m∗ isn't enough to make any negative parts when we add it to mmin.
Anyways, summing the two parts like that saps a bit from the λ value of Mmin, but adds an equal amount on the b value, so it lies below the λ⊙+b⊙ "barrier", and by nirvana-free upper-completeness, it also lies in B. Then, we can express Mex as
Mex=Mmin+M∗=Mmin+(m∗,b∗)=Mmin+(m∗−,−m∗−(1))+(m∗+,b∗+m∗−(1))
=0.5(Mmin+(m∗−,−m∗−(1)))+0.5(Mmin+(m∗−,−m∗−(1))+2(m∗+,b∗+m∗−(1)))
Now, we already know that Mmin+(m∗−,−m∗−(1)) is an a-measure, and (m∗+,b∗+m∗−(1)) is an a-measure (no negative parts, end term is ≥0). So, we just wrote our extreme point as a mix of two distinct a-measures, so it's not extreme. Contradiction. Therefore, all extreme points have λ+b≤λ⊙+b⊙.
Let's resume. From bounded-minimals, we know that B has a suitable bound on λ+b, so the minimal points respect the λ⊙+b⊙ bound. Take B and chop it off at some high hyperplane, like λ+b≤2(λ⊙+b⊙). (the constant 2 isn't that important, it just has to be above 1 so we net all the extreme points and minimal points). Call the resulting set C. Due to the bound, and B being closed, C is compact by the Compactness Lemma. It's also convex.
Now, invoke the Krein-Milman theorem (and also, we're in a finite-dimensional space since we're working with a stub, finitely many observation leaf nodes, so we don't need to close afterwards, check the Wikipedia page for Krein-Milman Theorem at the bottom) to go C=c.h(extreme(C)). The only extreme points in C are either points that were originally extreme in B, or points on the high hyperplane that we chopped at.
Fix someMmin∈Bmin. Bmin⊆C, so Mmin∈C. Thus, Mmin can be written as a finite mixture of points from extreme(C). However, because Mmin is minimal, it can only be a mixture of minimal points, as we will show now.
Decompose Mmin into EζMi, and then decompose the Mi into Mmini+M∗i . To derive a contradiction, assume there exists some i′ where Mi′ isn't minimal, so that M∗i′ isn't 0. Then,
Mmin=EζMi=Eζ(Mmini+M∗i)=Eζ(Mmini)+Eζ(M∗i)
Thus, we have decomposed our minimal point into another point which is also present in B, and a nonzero sa-measure because M∗i′ is nonzero, so our original minimal point is actually nonminimal. and we have a contradiction. Therefore, all decompositions of a minimal point into a mix of points must have every component point being minimal as well.
So, when we decomposed Mmin into a mix of points in extreme(C), all the extreme points we decomposed it into are minimal, so there's no component on the high hyperplane. Mmin was arbitrary in Bmin establishing that Bmin⊆c.h(Bxmin). Therefore, c.h(Bmin)⊆c.h(Bxmin)
So we have our desired result.
Lemma 4: If πpa≥πhist≥πlost, and A⊆Ma(F(πhist)) and B⊆Ma(F(πlost)) (also works with the nirvana-free variants) and prπhist,πlost∗(A)⊆B then (prπpa,πhist∗)−1(A)⊆(prπpa,πlost∗)−1(B) This works for surmeasures too.
A point M in the preimage of A has prπpa,πhist∗(M)∈A, and by projections commuting and projecting down further landing you in B, we get prπpa,πlost∗(M)∈B, so M is in the preimage of B too.
Lemma 5: Given a partial policy πpa and stub πst, if πpa≥πst, then ∃n:πnpa≥πst
πst is a stub that specifies less about what the policy does than πpa, and because it's a stub it has a minimum time beyond which it's guaranteed to be undefined, so just let that be your n. πnpa then specifies everything that πst does, and maybe more, because it has all the data of πpa up till time n.
Lemma 6: If πpa is a partial policy, and ∀πlost,πhist≥πlost:prπhist,πlost∗(Θst(πhist))⊆Θst(πlost) holds, then, for all m, ⋂πst≤πpa(prπpa,πst∗)−1(Θst(πst))=⋂n≥m(prπpa,πnpa∗)−1(Θst(πnpa)) This works for surmeasures too.
First, all the πnpa≤πpa are stubs, so we get one subset direction immediately.
⋂πst≤πpa(prπpa,πst∗)−1(Θst(πst))⊆⋂n≥m(prπpa,πnpa∗)−1(Θst(πnpa))
In the other direction, use Lemma 5 to find a πnpa≥πst, with n≥m, and then pair
∀πlost,πhist≥πhist:prπhist,πlost∗(Θst(πhist))⊆Θst(πlost)
with Lemma 4 to deduce that
(prπpa,πnpa∗)−1(Θst(πnpa))⊆(prπpa,πst∗)−1(Θst(πst))
Due to being able to take any stub preimage and find a smaller preimage amongst the fundamental sequence for πpa (with an initial segment clipped off) we don't need anything other than the preimages of the fundamental sequence (with an initial segment clipped off), which establishes the other direction and thus our result.
Lemma 7: If M is an a-measure and prπpa,πst∗(M)=M′ and M′=M′lo+M′∗ and M′lo is an a-measure, then there exists a Mlo∈Ma(F(πpa)) (or the nirvana-free variant) s.t. prπpa,πst∗(Mlo)=M′lo and there's an sa-measure M∗ s.t. Mlo+M∗=M. This works for a-surmeasures and sa-surmeasures too.
What this essentially says is "let's say we start with a M and project it down to M′, and then find a point M′lo below M′. Can we "go back up" and view M′lo as the projection of some point below M? Yes". It's advised to sketch out the setup of this one, if not the proof itself.
Proof sketch: To build our Mlo and M∗, the b components are preserved, but crafting the measure component for them is tricky. They've gotta project down to M′lo and M′∗ so those two give us our base case to start working from with the measures (and automatically get the "must project down appropriately" requirement) and then we can recursively build up by extending both of them with the conditional probabilities that M gives us. However, we must be wary of division-by-zero errors and accidentally assigning negative measure on Nirvana, which complicates things considerably. Once we've shown how to recursively build up the measure components of our Mlo and M∗, we then need to check four things. That they're both well formed (sum of measure on 1-step extensions of a history=measure on the history, no semimeasures here), that they sum up to make M, the measure component of Mlo can't go negative anywhere (to certify that it's an a-measure), and that the b term attached to M∗ is big enough to cancel out the negative regions (to certify that it's an sa-measure).
Proof: Let Mlo=(mlo,blo) where blo is the b term of M′lo. Let M∗=(m∗,b∗) where b∗ is the b term of M′∗. Recursively define mlo and m∗ on h that are prefixes of something in F(πpa) (or the nirvana-free variant) as follows:
If h is a prefix of something in F(πst) (or the nirvana-free variant), mlo(h)=m′lo(h) and m∗(h)=m′∗(h). That defines our base case. Now for how to inductively build up by mutual recursion. Let's use haN for a nirvana-history and hao for a non-nirvana history.
If m∗(h)<0, then
mlo(haN)=m(haN),mlo(hao)=m(hao)−m∗(h)#o
m∗(haN)=0,m∗(hao)=m∗(h)#o
#o is the number of non-nirvana observations that can come after ha.
If m∗(h)≥0 and m(h)>0, then
mlo(hao)=m(hao)m(h)mlo(h),m∗(hao)=m(hao)m(h)m∗(h)
and the same holds for defining mlo(haN) and m∗(haN).
If m∗(h)≥0 and m(h)=0, then mlo(hao)=m∗(hao)=0
We need to verify that these sum up to m, that they're both well-formed signed measures, that mlo has no negative parts, and that the b value for M∗ is big enough. mlo having no negative parts is immediate by the way we defined it, because it's nonnegative on all the base cases since m′lo came from an a-measure, and m came from an a-measure as well which lets you use induction to transfer that property all the way up the histories.
To verify that they sum up to m, observe that for base-case histories in F(πst),
mlo(h)+m∗(h)=m′lo(h)+m′∗(h)=m′(h)=m(h)
For non-base-case histories hao we can use induction (assume it's true for h) and go:
Negative case, m∗(h)<0.
mlo(haN)+m∗(haN)=m(haN)+0=m(haN)
mlo(hao)+m∗(hao)=m(hao)−m∗(h)#o+m∗(h)#o=m(hao)
Nonnegative case, no division by zero.
mlo(hao)+m∗(hao)=m(hao)m(h)(mlo(h)+m∗(h))=m(hao)m(h)m(h)=m(hao)
Zero case: m(hao)=0 because m(h)=0 and m came from an a-measure and has no negative parts. mlo(hao)+m∗(hao)=0+0=0=m(hao)
Ok, so we've shown that mlo+m∗=m.
What about checking that they're well-formed signed measures? To do this, it suffices to check that summing up their measure-mass over haoi gets the measure-mass over h. This works over the base case, so we just have to check the induction steps.
In the negative case, for m∗,
m∗(haN)+∑im∗(haoi)=∑im∗(h)#o=m∗(h)
and for mlo
mlo(haN)+∑imlo(haoi)=m(haN)+∑i(m(haoi)−m∗(h)#o)=m(h)−m∗(h)=mlo(h)
In the nonnegative case, no division by zero, then
mlo(haN)+∑imlo(haoi)=m(haN)m(h)mlo(h)+∑im(haoi)m(h)mlo(h)
=mlo(h)m(h)(m(haN)+∑im(haoi))=mlo(h)m(h)m(h)=mlo(h)
And similar for m∗.
In the zero case where m(h)=0, we need to show that mlo(h) and m∗(h) will also be zero. Winding h back, there's some longest prefix h′ where m(h′)>0. Now, knowing that m(h′)=mlo(h′)+m∗(h′), we have two possible options here.
In the first case, m∗(h′)≥0, so mlo(h′ao) (advancing one step) is:
mlo(h′ao)=m(h′ao)m(h′)mlo(h′)=0m(h′)mlo(h′)=0
And similar for m∗(h′ao), so they're both 0, along with m′, on h′ao, and then the zero case transfers the "they're both zero" property all the way up to h.
In the second case, m∗(h′)<0 and mlo(h′)>0. Then, proceeding forward, m∗(h′ao)<0, and this keeps holding all the way up to h, so we're actually in the negative case, not the zero case.
So, if m(h)=0, then mlo(h)=m∗(h)=0 as long as we're in the case where m∗(h)≥0 and m(h)=0. Then, it's easy. mlo(haN)+∑imlo(haoi)=0=mlo(h) and the same for m∗.
Also, m∗, by the way we defined it, never puts negative measure on a nirvana event, so we're good there, they're both well-formed signed measures. For the b∗ value being sufficient to compensate for the negative-measure of m∗, observe that the way we did the extension, the negative-measure for m∗ is the same as the negative measure for m′∗, and b∗=b′∗, and the latter is sufficient to cancel out the negative measure for m′∗, so we're good there.
We're done now, and this can be extended to a-surmeasures by taking the 0+ nirvana-events in m and saying that all those nirvana-events have 0+ measure in mlo.
Lemma 8: Having a λ+b bound on a set of a-surmeasures is sufficient to ensure that it's contained in a compact set w.r.t the surtopology.
This is the analogue of the Compactness Lemma for the sur-case. We'll keep it in the background instead of explicitly invoking it each time we go "there's a bound, therefore compactness". It's important.
Proof sketch: Given a sequence, the bound gets convergence of the measure part by the Compactness Lemma, and then we use Tychonoff to show that we can get a subsequence where the a-surmeasures start agreeing on which nirvana events are possible or impossible, for all nirvana events, so their first time of disagreement gets pushed arbitrarily far out, forcing convergence w.r.t. the surmetric.
Proof: given a sequence of a-surmeasures SMn, and rounding them off to their "standard" part (slicing off the 0+ probability), we can get a convergent subsequence, where the measure part and b part converges, by the Compactness Lemma since we have a λ+b bound, which translates into bounds on λ and b.
Now, all we need is a subsequence of that subsequence that ensures that, for each nirvana-event, the sequence of a-surmeasures starts agreeing on whether it's possible or impossible. There are countably many finite histories, and each nirvana-history is a finite history, so we index our nirvana events by natural numbers, and we can view our sequence as wandering around within {0,1}ω, where the t'th coordinate keeps track of whether the t'th nirvana event is possible or impossible. {0,1}ω is compact by Tychonoff's Theorem, so we can find a convergent subsequence, which corresponds to a sequence of a-surmeasures that, for any nirvana event, eventually start agreeing on whether it's possible or impossible. There's finitely many nirvana events before a certain finite time, so if we go far enough out in the n, the a-surmeasures agree on what nirvana events are possible or impossible for a very long time, and so the surdistance shrinks to 0 and they converge, establishing that all sequences have a convergent subsequence, so the set is compact.
Lemma 9: Given a πpa and a sequence of nonempty compact sets Bn∈Ma(F(πnpa)) (or the nirvana-free variant) where ∀n:prπn+1pa,πnpa∗(Bn+1)⊆Bn then there is a point M∈Ma(F(πpa)) (or the nirvana-free variant) where ∀n:prπpa,πnpa∗(M)∈Bn. This also works with a-surmeasures.
Sketch this one out. It's essentially "if sets get smaller and smaller, but not empty, as you ascend up the chain πnpa towards πpa, and are nested in each other, then there's something at the πpa level that projects down into all the πnpa"
Proof sketch: Projection preserves λ and b, the Compactness Lemma says that compactness means you have a λ and b bound, so the preimage of a compact set is compact. Then, we just have to verify the finite intersection property to show that the intersection of the preimages is nonempty, which is pretty easy since all our preimages are nested in each other like an infinite onion.
Proof: Consider the intersection ⋂n(prπpa,πnpa∗)−1(Bn). Because Bn are all compact, they have a λ and b bound. Projection preserves the λ and b values, so the preimage of Bn has a λ and b bound, therefore lies in a compact set (By Lemma 8 for the sur-case). The preimage of a closed set is also closed set, so all these preimages are compact. This is then an intersection of a family of compact sets, so we just need to check the nonempty intersection property. Fixing finitely many m, we can find an n above them all, and pick an arbitrary point in the preimage of Bn, and invoke Lemma 4 on Bn to conclude that said point lies in all lower preimages, thus demonstrating finite intersection. Therefore, the intersection is nonempty.
Lemma 10: Given a sequence of nonempty closed sets Bn where prπn+1pa,πnpa∗(Bn+1)⊆Bn, and a sequence of points Mn∈(prπpa,πnpa∗)−1(Bn), all limit points of the sequence Mn (if they exist) lie in ⋂n(prπpa,πnpa∗)−1(Bn) (works in the a-surmeasure case)
Proof: Assume a limit point exists, isolate a subsequence limiting to it. By Lemma 4, the preimages are nested in each other. Also, the preimage of a closed set is closed. Thus, for our subsequence, past n, the points are in the preimage of Bn and don't ever leave, so the limit point is in the preimage of Bn. This argument works for all n, so the limit point is in the intersection of the preimages.
The next three Lemmas are typically used in close succession to establish nirvana-free upper-completeness for projecting down a bunch of nirvana-free upper complete sets, and taking the closed convex hull of them, which is an operation we use a lot. The first one says that projecting down a nirvana-free upper-complete set is upper-complete, the second one says that convex hull preserves the property, and the third one says that closure preserves the property. The first one requires building up a suitable measure via recursion on conditional probabilities, the second one requires building up a whole bunch of sa-measures via recursion on conditional probabilities and taking limits of them to get suitable stuff to mix together, and the third one also requires building up a whole bunch of sa-measures via recursion on conditional probabilities and then some fanciness with defining a limiting sequence.
Lemma 11: In the nirvana-free setting, a projection of an upper-complete set is upper-complete.
Proof sketch: To be precise about exactly what this says, since we're working with a-measures, it says "if you take the fragment of the upper completion composed of a-measures, and project it down, then the thing you get is: the fragment of the upper completion of (projected set) composed of a-measures". Basically, since we're not working in the full space of sa-measures, and just looking at the a-measure part of the upper completion, that's what makes this one tricky and not immediate.
The proof path is: Take an arbitrary point Mlo in the projection of B which has been crafted by projecting down Mhi. Given an arbitrary M′lo:=Mlo+M∗lo (assuming it's an a-measure) which lies in the upper completion of the projection of B, we need to certify that it's in the projection of B to show that B is upper-complete. In order to do this, we craft a M∗hi and M′hi (an a-measure) s.t: Mhi+M∗hi=M′hi (certifying that M′hi is in B since B is upper complete), and M′hi projects down to hit our M′lo point of interest.
These a-measures are crafted by starting with the base case of M∗lo and M′lo, and recursively building up the conditional probabilities in accordance with the conditional probabilities of Mhi. Then we just verify the basic conditions like the measures being well-formed, M′hi being an a-measure, M∗hi having a big enough b term, and Mhi+M∗hi=M′hi, to get our result. Working in the Nirvana-free case is nice since we don't need to worry about assigning negative measure to Nirvana.
Proof: Let B⊆Ma(FNF(πhipa)) be our upper-complete set. We want to show that prπhipa,πlopa∗(B) is upper-complete. To that end, fix a Mlo in the projection of B that's the projection of a Mhi∈B. Let M′lo:=Mlo+M∗lo, where M′lo is an a-measure. Can we find an a-measure M′hi in B that projects down to M′lo? Let's define M∗hi and M′hi as follows:
Let M′hi=(m′hi,b′hi) where b′hi is b′lo. Let M∗hi=(m∗hi,b∗hi) where b∗hi is b∗lo. Recursively define m′hi and m∗hi on h that are prefixes of something in FNF(πhipa) as follows:
If h is a prefix of something in F(πlopa), m′hi(h)=m′lo(h) and m∗hi(h)=m∗lo(h).
Otherwise, recursively define the measure components m′hi and m∗hi as:
If mhi(h)>0, then
m′hi(hao)=mhi(hao)mhi(h)m′hi(h),m∗hi(hao)=mhi(hao)mhi(h)m∗hi(h)
If mhi(h)=0, then m′hi(hao)=m∗hi(hao)=0.
We need to verify that m′hi has no negative parts so it's fitting for an a-measure, that mhi+m∗hi=m′hi, that the b value for sm∗hi works, and that they're both well-formed signed measures. The first part is easy to establish, m′hi(h)=m′lo(h)≥0 in the base cases since M′lo is an a-measure and a quick induction as well as mhi coming from the a-measure Mhi (so no negatives anywhere) establish m′hi as not having any negative parts.
To verify that they sum up to m′hi, observe that for base-case histories in FNF(πlopa), mhi(h)+m∗hi(h)=mlo(h)+m∗lo(h)=m′lo(h)=m′hi(h). Then, in the general case, we can use induction (assume it's true for h) and go:
If mhi(h)>0, then
mhi(hao)+m∗hi(hao)=mhi(hao)+mhi(hao)mhi(h)m∗hi(h)=mhi(hao)mhi(h)(mhi(h)+m∗hi(h))
=mhi(hao)mhi(h)m′hi(h)=m′hi(h)
If mhi(h)=0, then mhi(hao)=0, so
mhi(hao)+m∗hi(hao)=0+0=0=m′hi(hao)
What about checking that they're well-formed measures? To do this, it suffices to check that summing up their measure-mass over haoi gets the measure-mass over h. If mhi(