Theorem 4: Pseudocausal IPOMDP's: If an infra-POMDP has a transition kernel K:S×Aik→O×S which fulfills the niceness conditions, and there's a starting infradistribution ψ∈□S, unrolling it into an infrakernel via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞) has the infrakernel Θ being pseudocausal and fulfilling all belief function conditions except for normalization.

This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the Kπn fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for Θ. Our list of conditions to show is:

1:The Θ(π) have a uniform upper bound on their Lipschitz constants.

2: π↦Θ(π)(U) is lower-semicontinuous for all U

3: Θ(π) is supported entirely on histories compatible with π.

4: All the Θ(π) agree on the value of 1 (or infinity).

5: Θ is pseudocausal.

Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite Kπ:n, showing sensible supports for Kπ:∞, and then finally extending this to the result we actually want, which is fairly easy.

And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that π↦(ψ⋉Kπ:∞)(f) is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for Θ from there.

T4.1 Our first order of business is showing the five niceness conditions for all the Kπn infrakernels, so we can take the infinite semidirect product and show that Kπ:∞ inherits the five niceness conditions (from the proof of Theorem 1).

Remember that Kπn is defined as

Kπn(s0,aos1:n):=δπ(ao1:n)⋉(λa.K(sn,a))

Obviously it produces inframeasures, since K does.

T4.1.1 For lower-semicontinuity in the input, we have

liminfm→∞Kπn(sm0,aosm1:n)(f)

=liminfm→∞(δπ(aom1:n)⋉(λa.K(smn,a)))(f)

=liminfm→∞K(smn,π(aom1:n))(λo,s.f(π(aom1:n),o,s))

and then, as aosm1:n limits to aos∞1:n, we have aom1:n=ao∞1:n forever after some finite m, because ∏i=ni=1(A×O) is a finite space. So, it turns into

=liminfm→∞K(smn,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))

and then, by lower-semicontinuity for K (one of the niceness conditions)

≥K(s∞n,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))

and then we pack this back up as

=δπ(aom1:n)⋉(λa.K(s∞n,a))(f)=Kπn(s∞0,aos∞1:n)(f)

and lower-semicontinuity has been shown.

T4.1.2 For 1-Lipschitzness, observe that δπ(ao1:n)⋉(λa.K(sn,a)) is 1-Lipschitz because K is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.

T4.1.3 For compact-shared compact-almost-support, observe that (s0,aos1:n)↦(sn,π(ao1:n)) is a continuous function, so any compact subset of S×(A×O×S)n (input to Kπn) maps to a compact subset of S×A (corresponding inputs for K). Then, we can apply compact-shared compact-almost-support for K (one of the niceness conditions) to get a compact almost-support on O×S, and take the product of that subset with all of A itself, to get a shared-compact almost-support for Kπn.

T4.1.4 For mapping constants to above constants, we have

Kπn(s0,aos1:n)(c)=(δπ(ao1:n)⋉(λa.K(sn,a)))(c)

=δπ(ao1:n)(λa.K(sn,a)(c))=K(sn,π(ao1:n))(c)≥c

By increasing constants for K (one of the niceness conditions). This same proof works with equality when c=1, because of the "map 1 to 1" condition on the [0,1] type signature. So all the Kπn have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.

T4.2 Now we can get working on showing the belief function conditions.

T4.2.1 For showing a bounded Lipschitz constant, we have

and then, since all the Kπ:∞(s0) are 1-Lipschitz (infinite semidirect product has niceness conditions), we get

≤λ⊙supaos1:∞|U(ao1:∞)−U′(ao1:∞)|=λ⊙⋅d(U,U′)

and we're done, we showed a Lipschitz bound on the Θ(π) that is uniform in π.

T4.2.2 Now for lower-semicontinuity for Θ, which is the extremely hard one.

The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the Kπ:∞ infrakernels simultaneously. Then, using that, we'll show that

π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous. And then, it's pretty easy to wrap up and get that

π↦Θ(π)(U) is lower-semicontinuous, from that.

T4.2.2.1 Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:

We let our CS be an arbitrary compact subset of states, and ϵ be arbitrary.

Now, let's assume there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π, C⊤1:n+1[CS,ϵ] is a shared ϵ(1−12n+1)-almost-support for all the Kπ:n(s0) inframeasures (where s0∈CS).

The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that C⊤1:n+m+1[CS,ϵ] is an ϵ(1−12n+m+1)-almost-support for all the Kπ:n+m(s0) with s0∈CS, as our s0 is. And we assumed our sequence of compact sets fulfilled that property.

So, where we're at is that our variant of the Almost-Monotone-Lemma works, as long as our assumption works out, that there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π or s0∈CS, C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0).

In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any ϵ and CS. For this, we'll be looking at a function

Ξ:(0,∞)×A×K(S)→K(O,S)

What this does is that Ξ(ϵ,a,CS) is defined to return a compact subset of O×S which is an ϵ-almost-support for all the K(a,s) inframeasures, when s∈CS. There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel K (one of the assumed niceness conditions), so there does indeed exist a function Ξ with these properties.

Then, we recursively define the C⊤i[CS,ϵ] as:

C⊤1[CS,ϵ]:=A×⋃aΞ(ϵ2,a,CS)

and

C⊤n+1[CS,ϵ]:=A×⋃aΞ(ϵ2n+1,a,prSn(C⊤1:n[CS,ϵ]))

All these sets are compact, by induction. For the first one, since Ξ always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact, C⊤1:n[CS,ϵ] is compact, and then projections of compact sets are compact, so then Ξ returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.

Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.

For the base-case, we want that C⊤1[CS,ϵ] is a ϵ2-almost-support for Kπ:n(s0) with s0∈CS. Well, assume f,f′ are equal on C⊤1[CS,ϵ], and s0∈CS.

and then we observe that since f=f′ on C⊤1[CS,ϵ], and that set is defined as A×⋃aΞ(ϵ2,a,CS) , we have that they agree on {π()}×Ξ(ϵ2,π(),CS)

So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is

≤ϵ2d(f,f′)

Showing that C⊤1[CS,ϵ] is a ϵ2-almost support for all the Kπ:0(s0), where π is arbitrary and s_0\in C^{S}.

Now for induction. We want to show that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for Kπ:n+1(s0) with s0∈CS, assuming that C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0) with s0∈CS.

To begin, assume f,f′ are equal on C⊤1:n+2[CS,ϵ] and s0∈CS. Then we can go:

We apply a Lemma 2 from LBIT decomposition, with C⊤1:n+1[CS,ϵ] as an ϵ(1−12n+1)-almost support for Kπ:n(s0), which works by induction assumption. Since Kπ:n(s0) is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as

Now, for that top term, we know that sn+1 comes from prSn+1(C⊤1:n+1[CS,ϵ]). We apply another Lemma 2 decomposition on K this time, with the compact set of interest being

Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))

which is a ϵ(12n+2)-almost-support for K(sn+1,π(ao1:n+1)). Pairing this with 1-Lipschitzness of K, we have an upper bound of

which all our aos1:n+1,π(ao1:n+1),o,s we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to

=ϵ(1−12n+2)d(f,f′)

Which witnesses that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for any Kπ:n+1(s0) (arbitrary π) as long as s0∈CS. So our induction step goes off without a hitch, and we now know that the C⊤i[CS,ϵ]sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result

And remember, the sequence of compact sets only depends on CS and ϵ, not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.

T4.2.2.2 For phase 2, we'll show that π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start with

liminfm→∞(ψ⋉Kπm:∞)(f)

Which unpacks as

=liminfm→∞ψ(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))

If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the ψ as a challenge of picking points in Ψ, so we get

In the [0,1] case, the set of minimal points is precompact. In the R case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said λs0.Kπm:∞(s0)(f) function can't be unbounded below, because it is above Kπm:∞(s0)(−||f||), and infinite semidirect products map constants to greater than their usual value, and Kπm:∞(s0) fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus λ⊙||f||, the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of Ψ to attain the min, call it CΨ. With this, we get

Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to π∞. Also, for each member of this subsequence, we can find an true minimizing (m,b) pair for it since we're minimizing the lower-semicontinuous function

(m,b)↦∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b

over a compact set. Said function is lower-semicontinuous because

λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞))

is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.

Since we're selecting a-measures from the compact set CΨ, we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the (m,b) pairs. The limit a-measure is (m∞,b∞). So, we have a rewrite as

Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit, bi limits to b∞, so we can pull that out of the limit, to get

At this point, since the mi limit to m∞, the set of the measure components {mi}i∈N⊔{∞} is compact in the space of measures on S. In particular, this implies that there is some compact set CS⊆S where all the mi only have ϵ or less measure outside said set (a necessary condition for precompactness of a set of measures)

Now, we split the measures mi into the component of measure on CS, and the component outside of it.

Theorem 4:K:S×Aik→O×SPseudocausal IPOMDP's:If an infra-POMDP has a transition kernelwhich fulfills the niceness conditions, and there's a starting infradistributionψ∈□S, unrolling it into an infrakernel viaΘ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞)has the infrakernelΘbeing pseudocausal and fulfilling all belief function conditions except for normalization.This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the Kπn fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for Θ. Our list of conditions to show is:

1:The Θ(π) have a uniform upper bound on their Lipschitz constants.

2: π↦Θ(π)(U) is lower-semicontinuous for all U

3: Θ(π) is supported entirely on histories compatible with π.

4: All the Θ(π) agree on the value of 1 (or infinity).

5: Θ is pseudocausal.

Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite Kπ:n, showing sensible supports for Kπ:∞, and then finally extending this to the result we actually want, which is fairly easy.

And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that π↦(ψ⋉Kπ:∞)(f) is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for Θ from there.

T4.1Our first order of business is showing the five niceness conditions for all the Kπn infrakernels, so we can take the infinite semidirect product and show that Kπ:∞ inherits the five niceness conditions (from the proof of Theorem 1).Remember that Kπn is defined as

Kπn(s0,aos1:n):=δπ(ao1:n)⋉(λa.K(sn,a))

Obviously it produces inframeasures, since K does.

T4.1.1For lower-semicontinuity in the input, we haveliminfm→∞Kπn(sm0,aosm1:n)(f)

=liminfm→∞(δπ(aom1:n)⋉(λa.K(smn,a)))(f)

=liminfm→∞K(smn,π(aom1:n))(λo,s.f(π(aom1:n),o,s))

and then, as aosm1:n limits to aos∞1:n, we have aom1:n=ao∞1:n forever after some finite m, because ∏i=ni=1(A×O) is a finite space. So, it turns into

=liminfm→∞K(smn,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))

and then, by lower-semicontinuity for K (one of the niceness conditions)

≥K(s∞n,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))

and then we pack this back up as

=δπ(aom1:n)⋉(λa.K(s∞n,a))(f)=Kπn(s∞0,aos∞1:n)(f)

and lower-semicontinuity has been shown.

T4.1.2For 1-Lipschitzness, observe that δπ(ao1:n)⋉(λa.K(sn,a)) is 1-Lipschitz because K is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.T4.1.3For compact-shared compact-almost-support, observe that (s0,aos1:n)↦(sn,π(ao1:n)) is a continuous function, so any compact subset of S×(A×O×S)n (input to Kπn) maps to a compact subset of S×A (corresponding inputs for K). Then, we can apply compact-shared compact-almost-support for K (one of the niceness conditions) to get a compact almost-support on O×S, and take the product of that subset with all of A itself, to get a shared-compact almost-support for Kπn.T4.1.4For mapping constants to above constants, we haveKπn(s0,aos1:n)(c)=(δπ(ao1:n)⋉(λa.K(sn,a)))(c)

=δπ(ao1:n)(λa.K(sn,a)(c))=K(sn,π(ao1:n))(c)≥c

By increasing constants for K (one of the niceness conditions). This same proof works with equality when c=1, because of the "map 1 to 1" condition on the [0,1] type signature. So all the Kπn have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.

T4.2Now we can get working on showing the belief function conditions.T4.2.1For showing a bounded Lipschitz constant, we have|Θ(π)(U)−Θ(π)(U′)|=|pr(A×O)ω∗(ψ⋉Kπ:∞)(U)−pr(A×O)ω∗(ψ⋉Kπ:∞)(U′)|

=|ψ(λs0.Kπ:∞(s0)(λaos1:∞.U(ao1:∞)))−ψ(λs0.Kπ:∞(s0)(λaos1:∞.U′(ao1:∞)))|

and then since ψ is an infradistribution, it must have a finite Lipschitz constant, λ⊙. So we then get

≤λ⊙⋅sups0|Kπ:∞(s0)(λaos1:∞.U(ao1:∞))−Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))|

and then, since all the Kπ:∞(s0) are 1-Lipschitz (infinite semidirect product has niceness conditions), we get

≤λ⊙supaos1:∞|U(ao1:∞)−U′(ao1:∞)|=λ⊙⋅d(U,U′)

and we're done, we showed a Lipschitz bound on the Θ(π) that is uniform in π.

T4.2.2Now for lower-semicontinuity for Θ, which is the extremely hard one.The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the Kπ:∞ infrakernels

simultaneously. Then, using that, we'll show thatπ↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous. And then, it's pretty easy to wrap up and get that

π↦Θ(π)(U) is lower-semicontinuous, from that.

T4.2.2.1Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:∀CS∈K(S),ϵ>0:∃¯¯¯¯C∈∏∞i=1K(A×O×S):∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):

Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈∏∞i=n+m+2Cif(aos1:n+m+1,aosn+m+2:∞))

≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈∏∞i=n+2Cif(aos1:n+1,aosn+2:∞))−4ϵ||f||

We let our CS be an arbitrary compact subset of states, and ϵ be arbitrary.

Now, let's

assumethere's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π, C⊤1:n+1[CS,ϵ] is a shared ϵ(1−12n+1)-almost-support for all the Kπ:n(s0) inframeasures (where s0∈CS).Then our proof goal becomes

∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):

Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈C⊤n+m+2:∞[CS,ϵ]f(aos1:n+m+1,aosn+m+2:∞))

≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))−4ϵ||f||

The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that C⊤1:n+m+1[CS,ϵ] is an ϵ(1−12n+m+1)-almost-support for all the Kπ:n+m(s0) with s0∈CS, as our s0 is. And we assumed our sequence of compact sets fulfilled that property.

So, where we're at is that our variant of the Almost-Monotone-Lemma works,

as long asour assumption works out, that there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π or s0∈CS, C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0).In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any ϵ and CS. For this, we'll be looking at a function

Ξ:(0,∞)×A×K(S)→K(O,S)

What this does is that Ξ(ϵ,a,CS) is defined to return a compact subset of O×S which is an ϵ-almost-support for all the K(a,s) inframeasures, when s∈CS. There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel K (one of the assumed niceness conditions), so there does indeed exist a function Ξ with these properties.

Then, we recursively define the C⊤i[CS,ϵ] as:

C⊤1[CS,ϵ]:=A×⋃aΞ(ϵ2,a,CS)

and

C⊤n+1[CS,ϵ]:=A×⋃aΞ(ϵ2n+1,a,prSn(C⊤1:n[CS,ϵ]))

All these sets are compact, by induction. For the first one, since Ξ always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact, C⊤1:n[CS,ϵ] is compact, and then projections of compact sets are compact, so then Ξ returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.

Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.

For the base-case, we want that C⊤1[CS,ϵ] is a ϵ2-almost-support for Kπ:n(s0) with s0∈CS. Well, assume f,f′ are equal on C⊤1[CS,ϵ], and s0∈CS.

|Kπ:0(s0)(f)−Kπ:0(s0)(f′)|=|Kπ0(s0)(f)−Kπ0(s0)(f′)|

=|(δπ()⋉(λa.K(s0,a)))(λa,o,s.f(a,o,s))−(δπ()⋉(λa.K(s0,a)))(λa,o,s.f′(a,o,s))|

=|K(s0,π())(λo,s.f(π(),o,s))−K(s0,π())(λo,s.f′(π(),o,s))|

Now, Ξ(ϵ2,π(),CS) is a ϵ2-almost-support for K(s0,π()) because s0∈CS, so we can apply Lemma 2 from LBIT, and 1-Lipschitzness of K to split this up as

≤supo,s∈Ξ(ϵ2,π(),CS)|f(π(),o,s)−f′(π(),o,s)|+ϵ2supo,s|f(π(),o,s)−f′(π(),o,s)|

That second term can be upper bounded by d(f,f′), so we get an upper bound of

≤supo,s∈Ξ(ϵ2,π(),CS)|f(π(),o,s)−f′(π(),o,s)|+ϵ2d(f,f′)

and then we observe that since f=f′ on C⊤1[CS,ϵ], and that set is defined as A×⋃aΞ(ϵ2,a,CS) , we have that they agree on {π()}×Ξ(ϵ2,π(),CS)

So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is

≤ϵ2d(f,f′)

Showing that C⊤1[CS,ϵ] is a ϵ2-almost support for all the Kπ:0(s0), where π is arbitrary and s_0\in C^{S}.

Now for induction. We want to show that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for Kπ:n+1(s0) with s0∈CS, assuming that C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0) with s0∈CS.

To begin, assume f,f′ are equal on C⊤1:n+2[CS,ϵ] and s0∈CS. Then we can go:

|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|

=|Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s)))

−Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f′(aos1:n+1,a,o,s)))|

and then unpack to get

=|Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.f(aos1:n+1,a,o,s)))

−Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.f′(aos1:n+1,a,o,s)))|

and unpack the semidirect product and substitute the dirac-delta value in to get

=|Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s)))

−Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s)))|

We apply a Lemma 2 from LBIT decomposition, with C⊤1:n+1[CS,ϵ] as an ϵ(1−12n+1)-almost support for Kπ:n(s0), which works by induction assumption. Since Kπ:n(s0) is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as

≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))

−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|

+ϵ(1−12n+1)supaos1:n+1|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))

−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|

By 1-Lipschitzness of K, we get an upper-bound of

≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))

−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|

+ϵ(1−12n+1)supaos1:n+1supo,s|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|

And then this is upper-boundable by d(f,f′), so we get

≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))

−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|+ϵ(1−12n+1)d(f,f′)

Now, for that top term, we know that sn+1 comes from prSn+1(C⊤1:n+1[CS,ϵ]). We apply another Lemma 2 decomposition on K this time, with the compact set of interest being

Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))

which is a ϵ(12n+2)-almost-support for K(sn+1,π(ao1:n+1)). Pairing this with 1-Lipschitzness of K, we have an upper bound of

≤supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))

|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|

+ϵ2n+2supo,s|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|)

+ϵ(1−12n+1)d(f,f′)

That second term is upper-bounded by d(f,f′) no matter what, so we have

≤supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))

|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|

+ϵ2n+2d(f,f′))+ϵ(1−12n+1)d(f,f′)

Pull out the constant and combine, to get

=supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))

|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|)

+ϵ(1−12n+2)d(f,f′)

Now, f,f′ were assumed to agree on C⊤1:n+2[CS,ϵ], which factors as

C⊤1:n+1[CS,ϵ]×C⊤n+2[CS,ϵ]

and then expands as

C⊤1:n+1[CS,ϵ]×(A×⋃aΞ(ϵ2n+2,a,prSn+1(C⊤1:n+1[CS,ϵ])))

which all our aos1:n+1,π(ao1:n+1),o,s we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to

=ϵ(1−12n+2)d(f,f′)

Which witnesses that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for any Kπ:n+1(s0) (arbitrary π) as long as s0∈CS. So our induction step goes off without a hitch, and we now know that the C⊤i[CS,ϵ]sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result

∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):

Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈C⊤n+m+2:∞[CS,ϵ]f(aos1:n+m+1,aosn+m+2:∞))

≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))−4ϵ||f||

In particular, since any defining sequence for Kπ:∞(s0) has an actual limit, this can be trivially reshuffled to attain the result:

∀π∈Π,n∗∈N,s0∈CS,f∈CB((A×O×S)ω):

limn→∞Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))

≥Kπ:n∗(s0)(λaos1:n∗+1.infaosn∗+2:∞∈C⊤n∗+2:∞[CS,ϵ]f(aos1:n∗+1,aosn∗+2:∞))−4ϵ||f||

And remember, the sequence of compact sets only depends on CS and ϵ, not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.

T4.2.2.2For phase 2, we'll show that π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start withliminfm→∞(ψ⋉Kπm:∞)(f)

Which unpacks as

=liminfm→∞ψ(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))

If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the ψ as a challenge of picking points in Ψ, so we get

=liminfm→∞(inf(m,b)∈Ψ(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))

And, since we can pick amongst minimal points, we can rewrite this as

=liminfm→∞(inf(m,b)∈Ψmin(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))

In the [0,1] case, the set of minimal points is precompact. In the R case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said λs0.Kπm:∞(s0)(f) function can't be unbounded below, because it is above Kπm:∞(s0)(−||f||), and infinite semidirect products map constants to greater than their usual value, and Kπm:∞(s0) fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus λ⊙||f||, the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of Ψ to attain the min, call it CΨ. With this, we get

=liminfm→∞(inf(m,b)∈CΨ(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))

And then we unpack this as an integral.

=liminfm→∞(inf(m,b)∈CΨ(∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b))

Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to π∞. Also, for each member of this subsequence, we can find an true minimizing (m,b) pair for it since we're minimizing the lower-semicontinuous function

(m,b)↦∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b

over a compact set. Said function is lower-semicontinuous because

λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞))

is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.

Since we're selecting a-measures from the compact set CΨ, we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the (m,b) pairs. The limit a-measure is (m∞,b∞). So, we have a rewrite as

=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi+bi)

Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit, bi limits to b∞, so we can pull that out of the limit, to get

=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi)+b∞

At this point, since the mi limit to m∞, the set of the measure components {mi}i∈N⊔{∞} is compact in the space of measures on S. In particular, this implies that there is some compact set CS⊆S where

allthe mi only have ϵ or less measure outside said set (a necessary condition for precompactness of a set of measures)Now, we split the measures mi into the component of measure on CS, and the component outside of it.

=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi,CS

+∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi,¬C