Theorem 4: Pseudocausal IPOMDP's: If an infra-POMDP has a transition kernel K:S×Aik→O×S which fulfills the niceness conditions, and there's a starting infradistribution ψ∈□S, unrolling it into an infrakernel via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞) has the infrakernel Θ being pseudocausal and fulfilling all belief function conditions except for normalization.
This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the Kπn fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for Θ. Our list of conditions to show is:
1:The Θ(π) have a uniform upper bound on their Lipschitz constants.
2: π↦Θ(π)(U) is lower-semicontinuous for all U
3: Θ(π) is supported entirely on histories compatible with π.
4: All the Θ(π) agree on the value of 1 (or infinity).
5: Θ is pseudocausal.
Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite Kπ:n, showing sensible supports for Kπ:∞, and then finally extending this to the result we actually want, which is fairly easy.
And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that π↦(ψ⋉Kπ:∞)(f) is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for Θ from there.
T4.1 Our first order of business is showing the five niceness conditions for all the Kπn infrakernels, so we can take the infinite semidirect product and show that Kπ:∞ inherits the five niceness conditions (from the proof of Theorem 1).
Remember that Kπn is defined as
Kπn(s0,aos1:n):=δπ(ao1:n)⋉(λa.K(sn,a))
Obviously it produces inframeasures, since K does.
T4.1.1 For lower-semicontinuity in the input, we have
liminfm→∞Kπn(sm0,aosm1:n)(f)
=liminfm→∞(δπ(aom1:n)⋉(λa.K(smn,a)))(f)
=liminfm→∞K(smn,π(aom1:n))(λo,s.f(π(aom1:n),o,s))
and then, as aosm1:n limits to aos∞1:n, we have aom1:n=ao∞1:n forever after some finite m, because ∏i=ni=1(A×O) is a finite space. So, it turns into
=liminfm→∞K(smn,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then, by lower-semicontinuity for K (one of the niceness conditions)
≥K(s∞n,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then we pack this back up as
=δπ(aom1:n)⋉(λa.K(s∞n,a))(f)=Kπn(s∞0,aos∞1:n)(f)
and lower-semicontinuity has been shown.
T4.1.2 For 1-Lipschitzness, observe that δπ(ao1:n)⋉(λa.K(sn,a)) is 1-Lipschitz because K is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.
T4.1.3 For compact-shared compact-almost-support, observe that (s0,aos1:n)↦(sn,π(ao1:n)) is a continuous function, so any compact subset of S×(A×O×S)n (input to Kπn) maps to a compact subset of S×A (corresponding inputs for K). Then, we can apply compact-shared compact-almost-support for K (one of the niceness conditions) to get a compact almost-support on O×S, and take the product of that subset with all of A itself, to get a shared-compact almost-support for Kπn.
T4.1.4 For mapping constants to above constants, we have
Kπn(s0,aos1:n)(c)=(δπ(ao1:n)⋉(λa.K(sn,a)))(c)
=δπ(ao1:n)(λa.K(sn,a)(c))=K(sn,π(ao1:n))(c)≥c
By increasing constants for K (one of the niceness conditions). This same proof works with equality when c=1, because of the "map 1 to 1" condition on the [0,1] type signature. So all the Kπn have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.
T4.2 Now we can get working on showing the belief function conditions.
T4.2.1 For showing a bounded Lipschitz constant, we have
and then, since all the Kπ:∞(s0) are 1-Lipschitz (infinite semidirect product has niceness conditions), we get
≤λ⊙supaos1:∞|U(ao1:∞)−U′(ao1:∞)|=λ⊙⋅d(U,U′)
and we're done, we showed a Lipschitz bound on the Θ(π) that is uniform in π.
T4.2.2 Now for lower-semicontinuity for Θ, which is the extremely hard one.
The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the Kπ:∞ infrakernels simultaneously. Then, using that, we'll show that
π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous. And then, it's pretty easy to wrap up and get that
π↦Θ(π)(U) is lower-semicontinuous, from that.
T4.2.2.1 Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:
We let our CS be an arbitrary compact subset of states, and ϵ be arbitrary.
Now, let's assume there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π, C⊤1:n+1[CS,ϵ] is a shared ϵ(1−12n+1)-almost-support for all the Kπ:n(s0) inframeasures (where s0∈CS).
The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that C⊤1:n+m+1[CS,ϵ] is an ϵ(1−12n+m+1)-almost-support for all the Kπ:n+m(s0) with s0∈CS, as our s0 is. And we assumed our sequence of compact sets fulfilled that property.
So, where we're at is that our variant of the Almost-Monotone-Lemma works, as long as our assumption works out, that there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π or s0∈CS, C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0).
In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any ϵ and CS. For this, we'll be looking at a function
Ξ:(0,∞)×A×K(S)→K(O,S)
What this does is that Ξ(ϵ,a,CS) is defined to return a compact subset of O×S which is an ϵ-almost-support for all the K(a,s) inframeasures, when s∈CS. There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel K (one of the assumed niceness conditions), so there does indeed exist a function Ξ with these properties.
Then, we recursively define the C⊤i[CS,ϵ] as:
C⊤1[CS,ϵ]:=A×⋃aΞ(ϵ2,a,CS)
and
C⊤n+1[CS,ϵ]:=A×⋃aΞ(ϵ2n+1,a,prSn(C⊤1:n[CS,ϵ]))
All these sets are compact, by induction. For the first one, since Ξ always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact, C⊤1:n[CS,ϵ] is compact, and then projections of compact sets are compact, so then Ξ returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.
Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.
For the base-case, we want that C⊤1[CS,ϵ] is a ϵ2-almost-support for Kπ:n(s0) with s0∈CS. Well, assume f,f′ are equal on C⊤1[CS,ϵ], and s0∈CS.
and then we observe that since f=f′ on C⊤1[CS,ϵ], and that set is defined as A×⋃aΞ(ϵ2,a,CS) , we have that they agree on {π()}×Ξ(ϵ2,π(),CS)
So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is
≤ϵ2d(f,f′)
Showing that C⊤1[CS,ϵ] is a ϵ2-almost support for all the Kπ:0(s0), where π is arbitrary and s_0\in C^{S}.
Now for induction. We want to show that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for Kπ:n+1(s0) with s0∈CS, assuming that C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0) with s0∈CS.
To begin, assume f,f′ are equal on C⊤1:n+2[CS,ϵ] and s0∈CS. Then we can go:
We apply a Lemma 2 from LBIT decomposition, with C⊤1:n+1[CS,ϵ] as an ϵ(1−12n+1)-almost support for Kπ:n(s0), which works by induction assumption. Since Kπ:n(s0) is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as
Now, for that top term, we know that sn+1 comes from prSn+1(C⊤1:n+1[CS,ϵ]). We apply another Lemma 2 decomposition on K this time, with the compact set of interest being
Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
which is a ϵ(12n+2)-almost-support for K(sn+1,π(ao1:n+1)). Pairing this with 1-Lipschitzness of K, we have an upper bound of
which all our aos1:n+1,π(ao1:n+1),o,s we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to
=ϵ(1−12n+2)d(f,f′)
Which witnesses that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for any Kπ:n+1(s0) (arbitrary π) as long as s0∈CS. So our induction step goes off without a hitch, and we now know that the C⊤i[CS,ϵ]sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result
And remember, the sequence of compact sets only depends on CS and ϵ, not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.
T4.2.2.2 For phase 2, we'll show that π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start with
liminfm→∞(ψ⋉Kπm:∞)(f)
Which unpacks as
=liminfm→∞ψ(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))
If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the ψ as a challenge of picking points in Ψ, so we get
In the [0,1] case, the set of minimal points is precompact. In the R case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said λs0.Kπm:∞(s0)(f) function can't be unbounded below, because it is above Kπm:∞(s0)(−||f||), and infinite semidirect products map constants to greater than their usual value, and Kπm:∞(s0) fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus λ⊙||f||, the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of Ψ to attain the min, call it CΨ. With this, we get
Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to π∞. Also, for each member of this subsequence, we can find an true minimizing (m,b) pair for it since we're minimizing the lower-semicontinuous function
(m,b)↦∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b
over a compact set. Said function is lower-semicontinuous because
λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞))
is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.
Since we're selecting a-measures from the compact set CΨ, we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the (m,b) pairs. The limit a-measure is (m∞,b∞). So, we have a rewrite as
Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit, bi limits to b∞, so we can pull that out of the limit, to get
At this point, since the mi limit to m∞, the set of the measure components {mi}i∈N⊔{∞} is compact in the space of measures on S. In particular, this implies that there is some compact set CS⊆S where all the mi only have ϵ or less measure outside said set (a necessary condition for precompactness of a set of measures)
Now, we split the measures mi into the component of measure on CS, and the component outside of it.
Because CS is a compact subset of S, we can use it as a seed set for our C⊤n[CS,ϵ] sequence that attains almost-monotonicity. To economize on space for the lines to fit, we'll write
infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
as just
infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
We unpack the defining limit of the infinite semidirect product with our relevant sequence of compact sets to get
It is exceptionally important to note here that we can't just throw our variant of the Almost-Monotone Lemma at this, because we don't have a promise thats0∈CS.
An easy way to show this is to show it for all n≥n∗. So let n≥n∗. Then, our proof goal is that
Now, to break down the integral on the top. Since the behavior of the interior of that integral only depends on s0∈CS, we can apply our good old Almost-Monotone-Lemma variation, and lower-bound the limit by the n∗ finite stage, minus 4ϵ||f|| (independently of the choice of πi) yielding
Now, we pull that constant piece out of the integral, and since both functions in the integral are now identical, we can stitch the measure components back together to get:
Now, mi,CS is a fragment of mi, which came from a minimal point in Ψ or a limit of such, so the maximal amount of measure present would be λ⊙. This lets us get a lower bound of
For all n∗. Regardless of n∗, this liminf value in i is finite because it's upper-bounded by the liminf in m, which is assumed to be finite (we disposed of the infinite case already). Because the sequence mi and the compact sets C⊤n[CS,ϵ] were fixed before we picked n∗, we can take a limit of the n∗ without worrying about it affecting any of the stuff on the inside of the integral, to get
This liminf in i of the integral is finite for all n. At this point, we can notice something interesting. For fixed n, when we take the limit as i approaches infinity, eventually Kπi:n will become Kπ∞:n and never go back, because of convergence of the policies, and the fact that the infrakernel has a cutoff at time n, so the behavior of the policies before that time will stabilize on a partial policy. So, we can rewrite our lower bound as:
is lower-semicontinuous in s0 (as all the finite stages of the infinite infrakernel have that niceness property). And m↦m(f∗) is lower-semicontinuous when f∗ is. Also, by now, the only thing i is controlling is the measure term. Therefore, that liminf of the integral (in i) is greater than what you'd get if you just passed to m∞. So, we get
(any sequence of compact sets works to define the infinite semidirect product) and for all n, the integrals are finite (this is why we kept restating it over and over!), we can apply Fatou's Lemma to move the liminf into the integral, to get
But because ϵ was arbitrary and ||f|| and λ⊙ are finite, we have our desired net result of
liminfm→∞(ψ⋉Kπm:∞)(f)≥(ψ⋉Kπ∞:∞)(f)
And lower-semicontinuity is shown for π↦(ψ⋉Kπ:∞)(f)
T4.2.2.3 Now for our phase 3, getting lower-semicontinuity of π↦Θ(π)(U). Let πm limit to π∞. We have
liminfm→∞Θ(πm)(U)
and then this unpacks as
=liminfm→∞pr(A×O)ω∗(ψ⋉Kπm:∞)(U)
Reexpressing the projection, we have:
=liminfm→∞(ψ⋉Kπm:∞)(λs0,aos1:∞.U(ao1:∞))
And then, we can apply lower-semicontinuity of π↦(ψ⋉Kπ:∞)(f) which we've shown in phase 2, to get
≥(ψ⋉Kπ∞:∞)(λs0,aos1:∞.U(ao1:∞))
Which then packs up as
=pr(A×O)ω∗(ψ⋉Kπ∞:∞)(U)
and then
=Θ(π∞)(U)
and we have lower-semicontinuity for Θ! Finally!
T4.2.3 Time to return to easier fare. We need that Θ(π) is supported entirely on histories compatible with π. To do this, we proceed in three steps.
First, we do an induction proof that, for any given π and s0, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the actions and observations so far are compatible with π.
Second, we use our first result to show that for any two continuous bounded utility functions U,U′ which only depend on the action/observation sequence, and agree on all histories compatible with π,
Finally, we use this result to show that if U,U′ agree on all histories compatible with π, Θ(π)(U)=Θ(π)(U′), showing that Θ(π) is supported on histories compatible with π.
T4.2.3.1 Let's start with the induction proof that Kπ:n(s0) has the subset of (A×O×S)n+1 where the history of actions and observations is compatible with π as a support. The way to do that is to pick two functions that are identical on said subset, and show they have equal expectation values. For the base case of the induction proof, fix an f and f′ which are identical on {π()}×O×S (as that's the subset of A×O×S which is compatible with π), and observe:
Kπ:0(s0)(f)=Kπ0(s0)(λa,o,s.f(a,o,s))
and then by how Kπ0(s0) is defined, we have
=(δπ()⋉(λa.K(s0,a)))(λa′,o,s.f(a′,o,s))
Which, by unpacking the semidirect product and subsituting the dirac-delta value in, we have
=K(s0,π())(λo,s.f(π(),o,s))
and then since f=f′ when the initial action equals π(), we have
=K(s0,π())(λo,s.f′(π(),o,s))
and then this just wraps back up by our sequence of rewrite steps in reverse as
=Kπ:0(s0)(f′)
Now that we have the base case out of the way, we'll attempt to show the induction step. Let f and f′ be identical on the subset of (A×O×S)n+2 which is compatible with π, and we'll show they have equal expectation value. Our job is to unpack
|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|
this unpacks, by how sequences of semidirect products are defined, as
Now, since, by induction assumption, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the history is compatible with π, we just need to show that the inner functions are identical on that subset to get that the difference is 0, so our proof goal is now
Theorem 4: Pseudocausal IPOMDP's: If an infra-POMDP has a transition kernel K:S×Aik→O×S which fulfills the niceness conditions, and there's a starting infradistribution ψ∈□S, unrolling it into an infrakernel via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞) has the infrakernel Θ being pseudocausal and fulfilling all belief function conditions except for normalization.
This takes a whole lot of work to show. The proof breaks down into two phases. The first is showing that the Kπn fulfill the niceness conditions (which splits into four subphases of showing 1-Lipschitzness, lower-semicontinuity, compact-shared CAS, and moving constants up/mapping 1 to 1). This is the easy part to do. The second phase is much longer. It splits into five pieces, where we show our list of conditions for Θ. Our list of conditions to show is:
1:The Θ(π) have a uniform upper bound on their Lipschitz constants.
2: π↦Θ(π)(U) is lower-semicontinuous for all U
3: Θ(π) is supported entirely on histories compatible with π.
4: All the Θ(π) agree on the value of 1 (or infinity).
5: Θ is pseudocausal.
Parts 1 and 4 can be done with little trouble, but parts 2, 3, and 5 split into additional pieces. To show pseudocausality, ie, condition 5, we have to start with our proof goal and keep rewriting it into an easier form, until we get something we can knock out with an induction proof, which splits into a base case and an induction step. To show sensible supports, ie, part 3, we have to split it into three parts, there's one induction proof for the finite Kπ:n, showing sensible supports for Kπ:∞, and then finally extending this to the result we actually want, which is fairly easy.
And then there's lower-semicontinuity, part 2, which is an absolute nightmare. This splits into three parts. One is explicitly constructing a sequence of compact sets to use in the Almost-Monotone Lemma for any policy whatsoever, and then there's trying to show that π↦(ψ⋉Kπ:∞)(f) is lower-semimicontinuous which requires a massive amount of messing around with integrals and limits, and then it's pretty easy to show lower-semicontinuity for Θ from there.
T4.1 Our first order of business is showing the five niceness conditions for all the Kπn infrakernels, so we can take the infinite semidirect product and show that Kπ:∞ inherits the five niceness conditions (from the proof of Theorem 1).
Remember that Kπn is defined as
Kπn(s0,aos1:n):=δπ(ao1:n)⋉(λa.K(sn,a))
Obviously it produces inframeasures, since K does.
T4.1.1 For lower-semicontinuity in the input, we have
liminfm→∞Kπn(sm0,aosm1:n)(f)
=liminfm→∞(δπ(aom1:n)⋉(λa.K(smn,a)))(f)
=liminfm→∞K(smn,π(aom1:n))(λo,s.f(π(aom1:n),o,s))
and then, as aosm1:n limits to aos∞1:n, we have aom1:n=ao∞1:n forever after some finite m, because ∏i=ni=1(A×O) is a finite space. So, it turns into
=liminfm→∞K(smn,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then, by lower-semicontinuity for K (one of the niceness conditions)
≥K(s∞n,π(ao∞1:n))(λo,s.f(π(ao∞1:n),o,s))
and then we pack this back up as
=δπ(aom1:n)⋉(λa.K(s∞n,a))(f)=Kπn(s∞0,aos∞1:n)(f)
and lower-semicontinuity has been shown.
T4.1.2 For 1-Lipschitzness, observe that δπ(ao1:n)⋉(λa.K(sn,a)) is 1-Lipschitz because K is 1-Lipschitz, and this is the semidirect product of a 1-Lipschitz (all probability distributions are 1-Lipschitz, and this is a dirac-delta distribution) infradistribution with a 1-Lipschitz infrakernel.
T4.1.3 For compact-shared compact-almost-support, observe that (s0,aos1:n)↦(sn,π(ao1:n)) is a continuous function, so any compact subset of S×(A×O×S)n (input to Kπn) maps to a compact subset of S×A (corresponding inputs for K). Then, we can apply compact-shared compact-almost-support for K (one of the niceness conditions) to get a compact almost-support on O×S, and take the product of that subset with all of A itself, to get a shared-compact almost-support for Kπn.
T4.1.4 For mapping constants to above constants, we have
Kπn(s0,aos1:n)(c)=(δπ(ao1:n)⋉(λa.K(sn,a)))(c)
=δπ(ao1:n)(λa.K(sn,a)(c))=K(sn,π(ao1:n))(c)≥c
By increasing constants for K (one of the niceness conditions). This same proof works with equality when c=1, because of the "map 1 to 1" condition on the [0,1] type signature. So all the Kπn have the niceness properties, and we can form the infinite semidirect product at all, and the infinite semidirect product inherits these niceness conditions.
T4.2 Now we can get working on showing the belief function conditions.
T4.2.1 For showing a bounded Lipschitz constant, we have
|Θ(π)(U)−Θ(π)(U′)|=|pr(A×O)ω∗(ψ⋉Kπ:∞)(U)−pr(A×O)ω∗(ψ⋉Kπ:∞)(U′)|
=|ψ(λs0.Kπ:∞(s0)(λaos1:∞.U(ao1:∞)))−ψ(λs0.Kπ:∞(s0)(λaos1:∞.U′(ao1:∞)))|
and then since ψ is an infradistribution, it must have a finite Lipschitz constant, λ⊙. So we then get
≤λ⊙⋅sups0|Kπ:∞(s0)(λaos1:∞.U(ao1:∞))−Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))|
and then, since all the Kπ:∞(s0) are 1-Lipschitz (infinite semidirect product has niceness conditions), we get
≤λ⊙supaos1:∞|U(ao1:∞)−U′(ao1:∞)|=λ⊙⋅d(U,U′)
and we're done, we showed a Lipschitz bound on the Θ(π) that is uniform in π.
T4.2.2 Now for lower-semicontinuity for Θ, which is the extremely hard one.
The way this proof will work is that we'll first show a variant of the Almost-Monotone Lemma that works for all the Kπ:∞ infrakernels simultaneously. Then, using that, we'll show that
π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous. And then, it's pretty easy to wrap up and get that
π↦Θ(π)(U) is lower-semicontinuous, from that.
T4.2.2.1 Anyways, our first goal is a variant of the Almost-Monotone Lemma. The particular variant we're searching for, specialized to our particular case, is:
∀CS∈K(S),ϵ>0:∃¯¯¯¯C∈∏∞i=1K(A×O×S):∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):
Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈∏∞i=n+m+2Cif(aos1:n+m+1,aosn+m+2:∞))
≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈∏∞i=n+2Cif(aos1:n+1,aosn+2:∞))−4ϵ||f||
We let our CS be an arbitrary compact subset of states, and ϵ be arbitrary.
Now, let's assume there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π, C⊤1:n+1[CS,ϵ] is a shared ϵ(1−12n+1)-almost-support for all the Kπ:n(s0) inframeasures (where s0∈CS).
Then our proof goal becomes
∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):
Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈C⊤n+m+2:∞[CS,ϵ]f(aos1:n+m+1,aosn+m+2:∞))
≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))−4ϵ||f||
The proof of the Almost-Monotone Lemma goes through perfectly fine, the key part where our choice of compact sets matters is in the ability to make the assumption that C⊤1:n+m+1[CS,ϵ] is an ϵ(1−12n+m+1)-almost-support for all the Kπ:n+m(s0) with s0∈CS, as our s0 is. And we assumed our sequence of compact sets fulfilled that property.
So, where we're at is that our variant of the Almost-Monotone-Lemma works, as long as our assumption works out, that there's a sequence of compact sets C⊤i[CS,ϵ] where, regardless of n or π or s0∈CS, C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0).
In order to get our variant of the AML to go through, we really need to prove that there is such a sequence, for any ϵ and CS. For this, we'll be looking at a function
Ξ:(0,∞)×A×K(S)→K(O,S)
What this does is that Ξ(ϵ,a,CS) is defined to return a compact subset of O×S which is an ϵ-almost-support for all the K(a,s) inframeasures, when s∈CS. There are always compact subsets like this for any input because of shared almost-compact-support for the infrakernel K (one of the assumed niceness conditions), so there does indeed exist a function Ξ with these properties.
Then, we recursively define the C⊤i[CS,ϵ] as:
C⊤1[CS,ϵ]:=A×⋃aΞ(ϵ2,a,CS)
and
C⊤n+1[CS,ϵ]:=A×⋃aΞ(ϵ2n+1,a,prSn(C⊤1:n[CS,ϵ]))
All these sets are compact, by induction. For the first one, since Ξ always returns compact sets, we're taking a finite union of compact sets (is compact), and taking the product with a finite set (is compact). For induction up, since the product of compact sets is compact, C⊤1:n[CS,ϵ] is compact, and then projections of compact sets are compact, so then Ξ returns a compact set, and again, we take a finite union of compact sets and then the product with a compact set to make another compact set.
Now, let's get started on showing that C^{\top}_{1:n+1}[C^{S},\eps] is an \eps\left(1-\frac{1}{2^{n+1}}\right)-almost-support for K^{\pi}_{:n}(s_0), for arbitrary \pi and s_0\in C^{S}. This proof will proceed by induction.
For the base-case, we want that C⊤1[CS,ϵ] is a ϵ2-almost-support for Kπ:n(s0) with s0∈CS. Well, assume f,f′ are equal on C⊤1[CS,ϵ], and s0∈CS.
|Kπ:0(s0)(f)−Kπ:0(s0)(f′)|=|Kπ0(s0)(f)−Kπ0(s0)(f′)|
=|(δπ()⋉(λa.K(s0,a)))(λa,o,s.f(a,o,s))−(δπ()⋉(λa.K(s0,a)))(λa,o,s.f′(a,o,s))|
=|K(s0,π())(λo,s.f(π(),o,s))−K(s0,π())(λo,s.f′(π(),o,s))|
Now, Ξ(ϵ2,π(),CS) is a ϵ2-almost-support for K(s0,π()) because s0∈CS, so we can apply Lemma 2 from LBIT, and 1-Lipschitzness of K to split this up as
≤supo,s∈Ξ(ϵ2,π(),CS)|f(π(),o,s)−f′(π(),o,s)|+ϵ2supo,s|f(π(),o,s)−f′(π(),o,s)|
That second term can be upper bounded by d(f,f′), so we get an upper bound of
≤supo,s∈Ξ(ϵ2,π(),CS)|f(π(),o,s)−f′(π(),o,s)|+ϵ2d(f,f′)
and then we observe that since f=f′ on C⊤1[CS,ϵ], and that set is defined as A×⋃aΞ(ϵ2,a,CS) , we have that they agree on {π()}×Ξ(ϵ2,π(),CS)
So that term turns into 0, the two functions are equal on our set of interest, so our net upper bound is
≤ϵ2d(f,f′)
Showing that C⊤1[CS,ϵ] is a ϵ2-almost support for all the Kπ:0(s0), where π is arbitrary and s_0\in C^{S}.
Now for induction. We want to show that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for Kπ:n+1(s0) with s0∈CS, assuming that C⊤1:n+1[CS,ϵ] is a ϵ(1−12n+1)-almost-support for Kπ:n(s0) with s0∈CS.
To begin, assume f,f′ are equal on C⊤1:n+2[CS,ϵ] and s0∈CS. Then we can go:
|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|
=|Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s)))
−Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f′(aos1:n+1,a,o,s)))|
and then unpack to get
=|Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.f(aos1:n+1,a,o,s)))
−Kπ:n(s0)(λaos1:n+1.(δπ(ao1:n+1)⋉(λa.K(sn+1,a)))(λa,o,s.f′(aos1:n+1,a,o,s)))|
and unpack the semidirect product and substitute the dirac-delta value in to get
=|Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s)))
−Kπ:n(s0)(λaos1:n+1.K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s)))|
We apply a Lemma 2 from LBIT decomposition, with C⊤1:n+1[CS,ϵ] as an ϵ(1−12n+1)-almost support for Kπ:n(s0), which works by induction assumption. Since Kπ:n(s0) is also 1-Lipschitz (the niceness conditions we already know), the Lemma 2 upper bound breaks down as
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|
+ϵ(1−12n+1)supaos1:n+1|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|
By 1-Lipschitzness of K, we get an upper-bound of
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|
+ϵ(1−12n+1)supaos1:n+1supo,s|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|
And then this is upper-boundable by d(f,f′), so we get
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ]|K(sn+1,π(ao1:n+1))(λo,s.f(aos1:n+1,π(ao1:n+1),o,s))
−K(sn+1,π(ao1:n+1))(λo,s.f′(aos1:n+1,π(ao1:n+1),o,s))|+ϵ(1−12n+1)d(f,f′)
Now, for that top term, we know that sn+1 comes from prSn+1(C⊤1:n+1[CS,ϵ]). We apply another Lemma 2 decomposition on K this time, with the compact set of interest being
Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
which is a ϵ(12n+2)-almost-support for K(sn+1,π(ao1:n+1)). Pairing this with 1-Lipschitzness of K, we have an upper bound of
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|
+ϵ2n+2supo,s|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|)
+ϵ(1−12n+1)d(f,f′)
That second term is upper-bounded by d(f,f′) no matter what, so we have
≤supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|
+ϵ2n+2d(f,f′))+ϵ(1−12n+1)d(f,f′)
Pull out the constant and combine, to get
=supaos1:n+1∈C⊤1:n+1[CS,ϵ](supo,s∈Ξ(ϵ2n+2,π(ao1:n+1),prSn+1(C⊤1:n+1[CS,ϵ]))
|f(aos1:n+1,π(ao1:n+1),o,s)−f′(aos1:n+1,π(ao1:n+1),o,s)|)
+ϵ(1−12n+2)d(f,f′)
Now, f,f′ were assumed to agree on C⊤1:n+2[CS,ϵ], which factors as
C⊤1:n+1[CS,ϵ]×C⊤n+2[CS,ϵ]
and then expands as
C⊤1:n+1[CS,ϵ]×(A×⋃aΞ(ϵ2n+2,a,prSn+1(C⊤1:n+1[CS,ϵ])))
which all our aos1:n+1,π(ao1:n+1),o,s we're feeding in lie within. So those functions are identical for those inputs, and our upper-bound just reduces to
=ϵ(1−12n+2)d(f,f′)
Which witnesses that C⊤1:n+2[CS,ϵ] is a ϵ(1−12n+2)-almost-support for any Kπ:n+1(s0) (arbitrary π) as long as s0∈CS. So our induction step goes off without a hitch, and we now know that the C⊤i[CS,ϵ]sequence is suitable to get our modification of the Almost-Monotone-Lemma to work out. We have the result
∀π∈Π,n,m∈N,s0∈CS,f∈CB((A×O×S)ω):
Kπ:n+m(s0)(λaos1:n+m+1.infaosn+m+2:∞∈C⊤n+m+2:∞[CS,ϵ]f(aos1:n+m+1,aosn+m+2:∞))
≥Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))−4ϵ||f||
In particular, since any defining sequence for Kπ:∞(s0) has an actual limit, this can be trivially reshuffled to attain the result:
∀π∈Π,n∗∈N,s0∈CS,f∈CB((A×O×S)ω):
limn→∞Kπ:n(s0)(λaos1:n+1.infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(aos1:n+1,aosn+2:∞))
≥Kπ:n∗(s0)(λaos1:n∗+1.infaosn∗+2:∞∈C⊤n∗+2:∞[CS,ϵ]f(aos1:n∗+1,aosn∗+2:∞))−4ϵ||f||
And remember, the sequence of compact sets only depends on CS and ϵ, not on the policy. This works for any policy. Now that we have this result in the bag, we can move on to phase 2.
T4.2.2.2 For phase 2, we'll show that π↦(ψ⋉Kπ:∞)(f) is lower-semicontinuous, and dear lord this is going to be hard and take a lot of integrals and limits to show. We start with
liminfm→∞(ψ⋉Kπm:∞)(f)
Which unpacks as
=liminfm→∞ψ(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))
If this liminf is infinity, we're just done, infinity beats everything. So we'll be assuming that it settles down to a finite value. We can unpack the ψ as a challenge of picking points in Ψ, so we get
=liminfm→∞(inf(m,b)∈Ψ(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))
And, since we can pick amongst minimal points, we can rewrite this as
=liminfm→∞(inf(m,b)∈Ψmin(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))
In the [0,1] case, the set of minimal points is precompact. In the R case, it may not be precompact, but the only way it fails to be precompact is having b values running off to infinity. Said λs0.Kπm:∞(s0)(f) function can't be unbounded below, because it is above Kπm:∞(s0)(−||f||), and infinite semidirect products map constants to greater than their usual value, and Kπm:∞(s0) fulfills this condition. Thus, if the b value of a selected minimal point is far higher than the finite liminf value plus λ⊙||f||, the value said minimal point assigns to the function of interest is far higher than the true minimal value. So, when minimizing, we can restrict to a compact subset of Ψ to attain the min, call it CΨ. With this, we get
=liminfm→∞(inf(m,b)∈CΨ(m(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b))
And then we unpack this as an integral.
=liminfm→∞(inf(m,b)∈CΨ(∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b))
Now, we can pass to two subsequences. First, we can pass to a subsequence of the m that actually limits to the liminf value. On this subsequence, the policies still limit to π∞. Also, for each member of this subsequence, we can find an true minimizing (m,b) pair for it since we're minimizing the lower-semicontinuous function
(m,b)↦∫S(λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm+b
over a compact set. Said function is lower-semicontinuous because
λs0.Kπm:∞(s0)(λaos1:∞.f(s0,aos1:∞))
is lower-semicontinuous, it's one of the niceness properties that the infinite infrakernel inherits.
Since we're selecting a-measures from the compact set CΨ, we can find a subsequence of our subsequence where the a-measures converge. Now we'll be using i as our index, and using it to index the (m,b) pairs. The limit a-measure is (m∞,b∞). So, we have a rewrite as
=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi+bi)
Further, all numbers in this limit are finite, because we're operating in the case where the liminf is finite, and i indexes a subsequence which actually limits to the liminf. For said limit, bi limits to b∞, so we can pull that out of the limit, to get
=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi)+b∞
At this point, since the mi limit to m∞, the set of the measure components {mi}i∈N⊔{∞} is compact in the space of measures on S. In particular, this implies that there is some compact set CS⊆S where all the mi only have ϵ or less measure outside said set (a necessary condition for precompactness of a set of measures)
Now, we split the measures mi into the component of measure on CS, and the component outside of it.
=limi→∞(∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi,CS
+∫S(λs0.Kπi:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dmi,¬CS)+b∞
Because CS is a compact subset of S, we can use it as a seed set for our C⊤n[CS,ϵ] sequence that attains almost-monotonicity. To economize on space for the lines to fit, we'll write
infaosn+2:∞∈C⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
as just
infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
We unpack the defining limit of the infinite semidirect product with our relevant sequence of compact sets to get
=limi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.limn→∞(Kπm:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,¬CS)+b∞
We'll now be splitting up the second integral as two more integrals. Fix an arbitrary n∗.
=limi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS
+∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)+b∞
We're really in the weeds here, let's see if we can make any dent whatsoever. What we'll be showing is that, regardless of s0,
limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))≥−2||f||
It is exceptionally important to note here that we can't just throw our variant of the Almost-Monotone Lemma at this, because we don't have a promise that s0∈CS.
An easy way to show this is to show it for all n≥n∗. So let n≥n∗. Then, our proof goal is that
Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))≥−2||f||
First, observe that the functions
λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)
and
λaos1:n+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)
can only differ on the same input by at most 2||f||. Since the finite stages are all 1-Lipschitz or less, we have
Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))
≥Kπi:n(s0)(λaos1:n+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))−2||f||
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))
And then, by monotonicity of the infrakernel stages for functions which cut off at a finite point (it was from Theorem 1), we have
≥Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))−2||f||
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))
=−2||f||
And we're done, we have shown that, regardless of s0, we have
limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))≥−2||f||
Now, applying this fact to our giant list of integrals, we can take
limi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS
+∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))
−Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)+b∞
and apply our lower-bound to get
≥liminfi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS
+∫S(−2||f||)dmi,¬CS)+b∞
Also, since we picked CS so mi only has ϵ or less measure outside of it, regardless of i, we can impose a lower bound of:
≥liminfi→∞(∫S(λs0.limn→∞(Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))))dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)−2ϵ||f||+b∞
Now, to break down the integral on the top. Since the behavior of the interior of that integral only depends on s0∈CS, we can apply our good old Almost-Monotone-Lemma variation, and lower-bound the limit by the n∗ finite stage, minus 4ϵ||f|| (independently of the choice of πi) yielding
≥liminfi→∞(∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞))−4ϵ||f||)dmi,CS
+∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi,¬CS)−2ϵ||f||+b∞
Now, we pull that constant piece out of the integral, and since both functions in the integral are now identical, we can stitch the measure components back together to get:
=liminfi→∞(∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
+∫S(−4ϵ||f||)dmi,CS)−2ϵ||f||+b∞
Now, mi,CS is a fragment of mi, which came from a minimal point in Ψ or a limit of such, so the maximal amount of measure present would be λ⊙. This lets us get a lower bound of
≥liminfi→∞∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
−4ϵλ⊙||f||−2ϵ||f||+b∞
Doing a little bit of regrouping, we have
=liminfi→∞∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
Recapping what we've done so far, we showed that
liminfm→∞(ψ⋉Kπm:∞)(f)
≥liminfi→∞∫S(λs0.Kπi:n∗(s0)(λaos1:n∗+1.infC⊤n∗+2:∞[CS,ϵ]f(s0,aos1:n∗+1,aosn∗+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
For all n∗. Regardless of n∗, this liminf value in i is finite because it's upper-bounded by the liminf in m, which is assumed to be finite (we disposed of the infinite case already). Because the sequence mi and the compact sets C⊤n[CS,ϵ] were fixed before we picked n∗, we can take a limit of the n∗ without worrying about it affecting any of the stuff on the inside of the integral, to get
liminfm→∞(ψ⋉Kπm:∞)(f)
≥liminfn→∞liminfi→∞∫S(λs0.Kπi:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
This liminf in i of the integral is finite for all n. At this point, we can notice something interesting. For fixed n, when we take the limit as i approaches infinity, eventually Kπi:n will become Kπ∞:n and never go back, because of convergence of the policies, and the fact that the infrakernel has a cutoff at time n, so the behavior of the policies before that time will stabilize on a partial policy. So, we can rewrite our lower bound as:
=liminfn→∞liminfi→∞∫S(λs0.Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dmi
−2ϵ||f||(2λ⊙+1)+b∞
Again, for all n, this liminf in i quantity is finite. Now, the function
λs0.Kπ∞:n(s0)(λaos1:n+1.infCn+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
is lower-semicontinuous in s0 (as all the finite stages of the infinite infrakernel have that niceness property). And m↦m(f∗) is lower-semicontinuous when f∗ is. Also, by now, the only thing i is controlling is the measure term. Therefore, that liminf of the integral (in i) is greater than what you'd get if you just passed to m∞. So, we get
≥liminfn→∞∫S(λs0.Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dm∞
+b∞−2ϵ||f||(2λ⊙+1)
Again, for all n, the integral is finite. Now, since the functions
λs0.Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞))
converge pointwise in n to the function
λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞))
(any sequence of compact sets works to define the infinite semidirect product) and for all n, the integrals are finite (this is why we kept restating it over and over!), we can apply Fatou's Lemma to move the liminf into the integral, to get
≥∫S(λs0.liminfn→∞Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dm∞
+b∞−2ϵ||f||(2λ⊙+1)
And since we have pointwise limiting, this is
=∫S(λs0.limn→∞Kπ∞:n(s0)(λaos1:n+1.infC⊤n+2:∞[CS,ϵ]f(s0,aos1:n+1,aosn+2:∞)))dm∞
+b∞−2ϵ||f||(2λ⊙+1)
and then, packing it up,
=∫S(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))dm∞+b∞−2ϵ||f||(2λ⊙+1)
And then, pack it up as
=m∞(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b∞−2ϵ||f||(2λ⊙+1)
and one last step. Since (m∞,b∞) lies in Ψ, we can get
≥inf(m,b)∈Ψ(m(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))+b)−2ϵ||f||(2λ⊙+1)
which reexpresses as
=ψ(λs0.Kπ∞:∞(s0)(λaos1:∞.f(s0,aos1:∞)))−2ϵ||f||(2λ⊙+1)
=(ψ⋉Kπ∞:∞)(f)−2ϵ||f||(2λ⊙+1)
Our net result is
liminfm→∞(ψ⋉Kπm:∞)(f)≥(ψ⋉Kπ∞:∞)(f)−2ϵ||f||(2λ⊙+1)
But because ϵ was arbitrary and ||f|| and λ⊙ are finite, we have our desired net result of
liminfm→∞(ψ⋉Kπm:∞)(f)≥(ψ⋉Kπ∞:∞)(f)
And lower-semicontinuity is shown for π↦(ψ⋉Kπ:∞)(f)
T4.2.2.3 Now for our phase 3, getting lower-semicontinuity of π↦Θ(π)(U). Let πm limit to π∞. We have
liminfm→∞Θ(πm)(U)
and then this unpacks as
=liminfm→∞pr(A×O)ω∗(ψ⋉Kπm:∞)(U)
Reexpressing the projection, we have:
=liminfm→∞(ψ⋉Kπm:∞)(λs0,aos1:∞.U(ao1:∞))
And then, we can apply lower-semicontinuity of π↦(ψ⋉Kπ:∞)(f) which we've shown in phase 2, to get
≥(ψ⋉Kπ∞:∞)(λs0,aos1:∞.U(ao1:∞))
Which then packs up as
=pr(A×O)ω∗(ψ⋉Kπ∞:∞)(U)
and then
=Θ(π∞)(U)
and we have lower-semicontinuity for Θ! Finally!
T4.2.3 Time to return to easier fare. We need that Θ(π) is supported entirely on histories compatible with π. To do this, we proceed in three steps.
First, we do an induction proof that, for any given π and s0, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the actions and observations so far are compatible with π.
Second, we use our first result to show that for any two continuous bounded utility functions U,U′ which only depend on the action/observation sequence, and agree on all histories compatible with π,
Kπ:∞(s0)(λaos1:∞.U(ao1:∞))=Kπ:∞(s0)(λaos1:∞.U′(ao1:∞))
Finally, we use this result to show that if U,U′ agree on all histories compatible with π, Θ(π)(U)=Θ(π)(U′), showing that Θ(π) is supported on histories compatible with π.
T4.2.3.1 Let's start with the induction proof that Kπ:n(s0) has the subset of (A×O×S)n+1 where the history of actions and observations is compatible with π as a support. The way to do that is to pick two functions that are identical on said subset, and show they have equal expectation values. For the base case of the induction proof, fix an f and f′ which are identical on {π()}×O×S (as that's the subset of A×O×S which is compatible with π), and observe:
Kπ:0(s0)(f)=Kπ0(s0)(λa,o,s.f(a,o,s))
and then by how Kπ0(s0) is defined, we have
=(δπ()⋉(λa.K(s0,a)))(λa′,o,s.f(a′,o,s))
Which, by unpacking the semidirect product and subsituting the dirac-delta value in, we have
=K(s0,π())(λo,s.f(π(),o,s))
and then since f=f′ when the initial action equals π(), we have
=K(s0,π())(λo,s.f′(π(),o,s))
and then this just wraps back up by our sequence of rewrite steps in reverse as
=Kπ:0(s0)(f′)
Now that we have the base case out of the way, we'll attempt to show the induction step. Let f and f′ be identical on the subset of (A×O×S)n+2 which is compatible with π, and we'll show they have equal expectation value. Our job is to unpack
|Kπ:n+1(s0)(f)−Kπ:n+1(s0)(f′)|
this unpacks, by how sequences of semidirect products are defined, as
=|Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s)))
−Kπ:n(s0)(λaos1:n+1.Kπn+1(s0,aos1:n+1)(λa,o,s.f′(aos1:n+1,a,o,s)))|
Now, since, by induction assumption, Kπ:n(s0) is supported on the subset of (A×O×S)n+1 where the history is compatible with π, we just need to show that the inner functions are identical on that subset to get that the difference is 0, so our proof goal is now
∀aos1:n+1∼π:Kπn+1(s0,aos1:n+1)(λa,o,s.f(aos1:n+1,a,o,s))
=Kπn+1(s0,aos1:n+1)(λa,o