Lemma 3:If F is a continuous function of type X→K(Y), where K(Y) is the space of nonempty compact subsets of the space Y, then given any compact set CX⊆X, ⋃x∈CXF(x) will be compact in Y.
Fix some compact set CX⊆X, and continuous function F:X→K(Y). We will operate by taking an arbitrary open cover of ⋃x∈CXF(x) and finding a finite subcover.
Let {Oi}i∈I be an open cover of ⋃x∈CXF(x). The Oi are subsets of Y. The topology compatible with Hausdorff distance on K(Y) (space of compact subsets of Y) is the Vietoris topology, where the basis opens are given by finite collections of open sets in Y. You take the set of all compact subsets of Y which are subsets of the union of your finite collection of open sets, and intersect every open set in your finite collection
Accordingly, let J=Pfin(I) (the set of all finite subsets of I, the index set for our open cover), and fix a collection of open sets in K(Y), {Oj}j∈J. The sets Oj are defined as:
Oj:={CY∈K(Y)|CY⊆⋃i∈jOi∧∀i∈j:CY∩Oi≠∅}
Now, all the F(x) with x∈CX are compact (F produces compact sets as output), and they are all subsets of ⋃x∈CXF(x), so {Oi}i∈I is a cover of F(x), and due to its compactness we can identify a finite subcover, and prune away every open st which doesn't intersect F(x). F(x) is a subset of the union of those finitely many open sets, and intersects all of them, so the point F(x)∈K(Y) lies in the open set Oj induced by that finite cover of open sets.
This argument works for arbitrary F(x) with x∈CX, so the collection {Oj}j∈J is an open cover of F(CX). Also, because F is continuous and CX is compact, F(CX) is compact, so we can identify a finite subcover from {Oj}j∈J.
Then, consider the collection of open sets Oi where i∈j for some Oj which is part of the finite cover of F(CX). This is finitely many opens, we're unioning together finitely many (finitely many Oj selected) finite sets of open sets (each Oj is associated with finitely many Oi that it was built from).
Now we just have to show that this collection covers ⋃x∈CXF(x), and we'll have made our finite subcover and shown that said set is compact. Assume our finite collection of opens doesn't cover the set. Then there's some F(x) which wasn't covered completely. However, the point corresponding to F(x) in K(Y) lies in some Oj, and from its definition, the corresponding Oi manage to cover F(x), and we have a contradiction. We're done.
Proposition 19:h⋉K is an infradistribution, and preserves all properties indicated in the diagram at the start of this section if h and all the K(x) have said property.
To show this, we'll verify that it's well-defined at all, normalization, monotonicity, concavity, Lipschitzness, compact almost-support, and preservation of the properties.
(h⋉K)(f):=h(λx.K(x)(λy.f(x,y)))
Our first order of business is verifying that
λx.K(x)(λy.f(x,y))
is even a continuous function to be able to show that h can accept it as input.
For continuity, let xn limit to x, and we'll try to show that K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). Let λ⊙K be the Lipschitz constant upper bound of K.
First, note that {xn}n∈N∪{x} is a compact set because xn limits to x. Thus, by the compact-shared compact almost-support condition on an infrakernel, there must be some compact set Cϵ⊆Y where all the K(xn) agree that functions f,f′ agreeing on Cϵ have values only ϵd(f,f′) apart from each other.
Now, because f is a continuous bounded function X×Y→R, it's uniformly continuous when restricted to
({xn}n∈N∪{x})×Cϵ
as this is the product of two compact sets and is compact. Due to the uniform continuity of f restricted to that set, there is some number δ where points only δ apart in that set have their values only differing by ϵ. Further, there is some number m0 where, for all m≥m0, d(xm,x)<δ.
Additionally, the maximum difference between λy.f(x,y) and λy.f(x′,y) is 2||f||.
Now that we know our number m0 we can pick an arbitrary m above it, and go:
∀m≥m0∀y∈Cϵ:d((xm,y),(x,y))=d(xm,x)≤δ
∀m≥m0∀y∈Cϵ:|f(xm,y)−f(x,y)|≤ϵ
∀m≥m0:d((λy.f(xm,y))↓Cϵ,(λy.f(x,y))↓Cϵ)≤δ
And now, because these two functions restricted to Cϵ are only ϵ apart, we can apply Lemma 2 to conclude that (since Cϵ and λ⊙K work for all the K(xn))
And for each \eps we can construct a m_0 in this way, concluding that
limn→∞|K(xn)(λy.f(xx,y))−K(xn)(λy.f(x,y))|=0
Also, from our pointwise convergence condition on infradistributions,
limn→∞K(xn)(λy.f(x,y))=K(x)(λy.f(x,y))
Therefore,
limn→∞K(xn)(λy.f(xn,y))=K(x)(λy.f(x,y))
and so, we now know that
λx.K(x)(λy.f(x,y))
is a continuous function X→R. For boundedness, upper and lower bounds on λy.f(x,y) are ||f|| (and the negative version of it). Due to the shared Lipschitz constant on the K(x), an upper and lower-bound on λx.K(x)(λy.f(x,y)) is λ⊙K||f|| (and the negative version.) Thus, we can safely feed said function into the infradistribution h, so the semidirect product is well-defined. We must still show that it makes an infradistribution.
In order, this was the definition of the semidirect product, all the K(x) being concave so splitting them up produces a lower value (and then monotonicity for h), then h being concave.
This leaves Lipschitzness and CAS. For Lipschitzness, given some f and f′, and letting λ⊙h be the Lipschitz constant of h, we have:
Thus, that final thing shows that there's a finite Lipschitz constant for h⋉K.
This leaves compact almost-support. Pick any ϵ. This induces a compact set CXϵ which is an ϵ-almost-support for h, and then this compact set induces a compact set CYϵ which an ϵ-almost-support for all the K(x) where x∈CXϵ. Now, we can apply Lemma 2 to go:
Pretty much, that first part is the "CXϵ is an ϵ-almost-support for h" piece, and the second piece is the "hey, these two functions may be a bit different on said compact set, we've gotta multiply that by the Lipschitz constant" piece. So, let's work on unpacking these two distances. For the first one, we can go:
And, because f and f′ agree on CXϵ×CYϵ, we have λy.f(x,y) and λy.f′(x,y) agreeing on CYϵ, which is an ϵ-almost-support for all the K(x) where x∈CXϵ, so we have:
≤supx∈CXϵϵd(λy.f(x,y),λy.f′(x,y))
=ϵsupx∈CXϵsupy|f(x,y)−f′(x,y)|
≤ϵsupx,y|f(x,y)−f′(x,y)|=ϵd(f,f′)
Substituting this back in produces:
≤ϵλ⊙Kd(f,f′)+ϵλ⊙hd(f,f′)
And regrouping this and recapping means that we have:
|(h⋉K)(f)−(h⋉K)(f′)|≤ϵ(λ⊙K+λ⊙h)d(f,f′)
So we have crafted a compact ϵ(λ⊙K+λ⊙h)-support for h⋉K, and we can make ϵ arbitrarily small, so the semidirect product has compact almost-support, which is the last condition we needed.
1-Lipschitz: We showed in the Lipschitz section that an upper bound on the Lipschitz constant of h⋉K is the product of the Lipschitz constants of the kernel and the original infradistribution, so 1⋅1=1 and 1-Lipschitzness is preserved.
Our task now is to show that ⋃x∈Ch({x}×CK(x)) is compact, which will take a fair amount of topology work. Our first piece that we'll need is that if xn limits to x, then CK(xn) limits to CK(x) in Hausdorff-distance.
To show this, we'll split it into two parts. First, we'll assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x) and disprove that. Second, we'll assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn), and disprove that.
For the first part, assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). Craft the continuous function
f1:=λy.sup(1−1ϵinfy′∈CK(x)d(y,y′),0)
What this does is it's 1 on the set CK(x), and 0 on anything more than ϵ away from it. One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), so:
limn→∞infy∈CK(xn)f1(y)=infy∈CK(x)f1(y)
The latter term is 1 because f1 is 1 over CK(x). However, because we're assuming that infinitely often, there's a point in CK(xn) that is ϵ away from CK(x), the sequence on the left-hand side is infinitely often 0, so it doesn't converge and we have a contradiction.
For the second part, assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). By compactness of CK(x), we can find finitely many points yi in it s.t. every point in CK(x) is only ϵ2 away from one of the yi (cover CK(x) with ϵ2-size open balls centered on points in it and take a finite subcover). Now, for each of these, we can craft a function
fi:=λy.inf(1,2ϵd(y,yi))
So, this is 0 at the point yi, and 1 at any distance ϵ2 or more away from it.
One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), and there are finitely many fi, so there's some time where all of them nearly converge, ie:
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
However, infinitely often there's a point yn∈CK(x) that is ϵ away from CK(xn). yn is ϵ2 away from some yi, so that yi can't be closer than ϵ2 to CK(xn). (if it was closer, then we could pick some point in CK(xn) that's closer than ϵ2 to yi, and then since it's only ϵ2 away from yn, we'd have that the distance from yn to CK(xn) is below ϵ2, an impossibility).
Because the distance from yi to any point in CK(xn) is above ϵ2, then
This is because yi∈CK(x) and attains a value of 0 according to fi, while CK(xn) stays away from yi and all its points must have a value of 1. This situation happens infinitely often, which leads to a contradiction with
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
Because infinitely often, one of these fi has very different values, so the sequence is 1 infinitely often and can't limit to 0.
So, we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). And we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). Fixing any ϵ, in the tail of the sequence, CK(x) and CK(xn) are ϵ distance or closer in Hausdorff distance because you can't find points in either set which are far away from the other set. So, CK(xn) limits to CK(x) in Hausdorff-distance when xn limits to x, and we know that x↦CK(x) is a continuous function X→K(Y).
This lets us show that the set
⋃x∈Ch({x}×CK(x))
is closed, because if xn limits to x and yn∈CK(xn) and yn limits to y, we have that y∈CK(x) because CK(xn) limits to CK(x) in Hausdorff distance, so we've got closed graph.
Also, by invoking Lemma 3, we know that
⋃x∈ChCK(x)
is compact.
Time to wrap this all up. We know that ⋃x∈Ch{x}×CK(x) is closed in X×Y from our Hausdorff limit argument. This set is also a subset of:
Ch×⋃x∈ChCK(x)
Which is a product of two sets known to be compact, and is compact. It's a closed subset of a compact set, so it's compact. Therefore,
⋃x∈Ch{x}×CK(x)
is a compact set, and from way back,
(h⋉K)(f)=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)
And we've shown that set is compact, so h⋉K where h and all the K(x) are sharp can be written as minimizing over a compact set, so h⋉K is sharp. Thus, semidirect product preserves all the nice properties, and we're finally done with this proof.
Proposition 20:If all the K(x) are C-additive, then prX∗(h⋉K)=h.
This is because, since f(x) doesn't depend on y, it acts as a constant inside K(x) and C-additivity lets us pull it out.
Proposition 21:If K0,K1,K2... are a sequence of infrakernels of type Kn:∏i=ni=0Xiik→Xn+1, and h is an infradistribution over X0, then (...((h⋉K0)⋉K1)...⋉Km) can be rewritten as h⋉K:m where K:n is an infrakernel of type X0ik→∏i=n+1i=1Xi, recursively defined as K:0:=K0 and K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
So, for our inductive definition,
K:0(x0):=K0(x0)
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)
Our task is to show that these are all infrakernels, by induction, and that for any infradistribution h,
(...((h⋉K0)⋉K1)...⋉Kn)=h⋉K:n
For the base case, we observe that K:0 is an infrakernel because it equals K0, which is an infrakernel, and that h⋉K0=h⋉K:0
Time for the induction step. We'll assume that K:n is an infrakernel, and show that K:n+1 is. Further, we need to show that h⋉K:n+1=(h⋉K:n)⋉Kn+1. This will show the result.
Our first requirement is showing that for all x0, K:n+1(x0) is an infradistribution.
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
By our induction assumption, K:n(x0) is an infradistribution as K:n is an infrakernel. Further, λx1:n+1.Kn+1(x0,x1:n+1) is an infrakernel because Kn+1 is and we're just restricting it to a subset of its domain, so it keeps being an infrakernel. And we know from earlier that the semidirect product of an infradistribution and an infrakernel is an infradistribution. So that's taken care of.
Now, we must show a common Lipschitz constant, pointwise function convergence, and compact-shared compact almost-support for K:n+1 to certify that it's an infrakernel.
Starting with common Lipschitz constant, we can just note that, in our proof of Proposition 19, we saw that the Lipschitz constant of the semidirect product was upper-bounded by the product of the Lipschitz constants of the starting infradistributions and the kernel. Assuming that K:n is an infradistribution, we have that the Lipschitz constant of any K:n(x0) is upper-bounded by some λ⊙:n Lipschitz constant. Also, the Lipschitz constant of Kn+1(x0,x1:n+1) is upper-bounded by some λ⊙n+1 Lipschitz constant. Thus, λ⊙:nλ⊙n+1 is an upper-bound on the Lipschitz constant of any
K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
infradistribution, which is exactly K:n+1(x0), witnessing that K:n+1 has a uniform upper bound on its Lipschitz constants.
Time to move onto the second one, compact-shared compact almost-support.
This is the sentence that says that K:n+1 has compact-shared compact almost-support. f and f′ have type signature ∏i=n+2i=1Xi→R.
Now, this is going to be quite complicated, so pay close attention. Fix an arbitrary compact CX0⊆X0, and an arbitrary ϵ. Let λ⊙:n be the Lipschitz constant for the infrakernel K:n, and λ⊙n+1 be the Lipschitz constant for the infrakernel Kn+1.
Due to compact-shared compact-almost-support for K:n which exists by our induction assumption, your set CX0 induces a compact ϵ2λ⊙n+1-almost-support for the family of infradistributions K:n(x0) where x0∈CX0. Call said almost-support C∏i=n+1i=1Xiϵ2λ⊙n+1.
Further, due to compact-shared compact-almost-support for Kn+1 , the set
CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
induces a compact ϵ2λ⊙:n-almost-support for the family of infradistributions Kn+1(x0,x1:n+1) where (x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
Call said almost-support CXn+2ϵ2λ⊙:n
And now let your shared ϵ-almost-support for K:n+1(x0) where x0∈CX0 be:
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
We must show that said set is indeed a shared ϵ-almost-support for K:n+1(x0) where x0∈CX0. So, let f and f′ agree on said set. Then, we have:
This is just unpacking the definition of the iterated semidirect product, no issues here. Now, we use Lemma 2 and the fact that C∏i=n+1i=1Xiϵ2λ⊙n+1 is a ϵ2λ⊙n+1-almost-support for K:n(x0) when x0∈CX0, to get:
first. What we can do is use that, regardless of what is picked in the supremum, we have:
(x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
So this means that
CXn+2ϵ2λ⊙:n
is a ϵ2λ⊙:n-almost-support for Kn+1(x0,x1:n+1). Further, because f and f′ are identical on
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
and x1:n+1 was being selected from the former of those, then the functions λxn+2.f(x1:n+1,xn+2) (and the same for f′) agree on CXn+2ϵ2λ⊙:n, the almost-support. So, the supremum is upper-bounded by
And so we've shown that the functions are only ϵ times their distance apart, so the compact set we cooked up is indeed an ϵ-almost-support for K:n+1(x0) whenever x0∈CX0, and because ϵ and CX0 was arbitrary, we have compact-shared compact-almost-support for K:n+1.
Time to move onto the third one, pointwise convergence. If x0,m limits to x0,∞, we want K:n+1(x0,m)(f) to limit to K:n+1(x0,∞)(f). As usual, we use λ⊙n+1 for the Lipschitz constant of Kn+1 and λ⊙:n for the Lipschitz constant of K:n.
To begin with, fix an arbitrary ϵ and bounded continuous function f, and note that {x0,m}m∈N∪{∞} is a compact subset of X0. Because K:n:X0ik→∏i=n+1i=1Xi is assumed to be an infrakernel by induction, {x0,m}m∈N∪{∞} acts as a compact set for it. So, by compact-shared compact-almost-support for K:n, we can find a compact set C∏i=n+1i=1Xiϵ4λ⊙n+1||f|| which is a ϵ4λ⊙n+1||f||-almost-support for K:n.
Also, it is important to note that
λx:n+1.Kn+1(x:n+1)(λxn+2.f(x:n+1,xn+2))
Is a continuous function (as it must be for semidirect products with Kn+1 to have the functions on the inside be continuous). Accordingly, this means that the function:
λx0,x1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
must be uniformly continuous when restricted to the set {x0,m}m∈N∪{∞}×C∏i=n+1i=1Xiϵ4λ⊙n+1||f|| And so, by uniform continuity, given any ϵ, there is some δ difference in inputs which gives rise to a ϵ2λ⊙:n difference in output.
Now, here's what we'll be doing. We'll attempt to show the result that
Straight off the bat, we can apply Lemma 2 to decompose this difference into "starting Lipschitz constant times the difference of the inner functions on the compact set of interest" and "level of almost-support times the difference of the two functions", yielding:
Time to start breaking this down. First, to break down
supx1:n+1|Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
we can realize that the maximum value of one of these would be λ⊙n+1||f||, and the minimum possible value of one of these is −λ⊙n+1||f||, from Lipschitzness of Kn+1, producing an upper bound of:
=K:n(x0,∞)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))) so
limm→∞(K:n(x0,m)⋉(λx1:n+1.Kn+1(x0,m,x1:n+1)))(f)
=(K:n(x0,∞)⋉(λx1:n+1.Kn+1(x0,∞,x1:n+1)))(f)
so
limm→∞K:n+1(x0,m)(f)=K:n+1(x0,∞)(f)
And we're done, we showed pointwise convergence for K:n+1 which is the last condition necessary to show it's an infrakernel, and the induction proof goes through to show that all the K:n are infrakernels.
Now all that's left is to show that
h⋉K:n+1=(h⋉K:n)⋉Kn+1
using induction, we have the base case set up. We can go:
Proposition 22:K:∞ is an infrakernel (C-additive, specifically) if all the Kn are C-additive infrakernels. It is unchanged by altering the Ci sequence of compact sets. In addition, if all the Kn are homogenous/cohomogenous/crisp/sharp, then K:∞ will be so as well.
So, K:∞:X0ik→∏∞i=1Xi is defined as: Fixing an arbitrary sequence of compact sets Ci∈Xi, K:∞(x0)(f):=limn→∞K:n(x0)(λx1:n+1infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)) Is it an infrakernel?
This is going to suck unbelievably much, we're gonna need a ton of results. The game plan is:
Part 1: Show that the functions you're feeding into those infrakernels are guaranteed to be continuous, to make some progress towards showing that K:∞ is well-defined.
Part 2: Show that all the K:n are 1-Lipschitz, and also preserve all nice properties we'd want if all the Kn do (homogenity, cohomogenity, C-additity, crispness, sharpness).
Part 3: Show that if a function only depends on the first n coordinates of the input, then all the K:n+m start agreeing on the expectation value of the function.
Part 4: Give a general procedure for taking a compact subset of the space X0 and making a compact subset of the space ∏∞i=1Xi with nice properties related to compact almost-support, that preserves its nice properties when projected down to any finite stage.
Part 5: Use parts 2, 3, 4, and a complicated chain of reasoning to get a result which implies that it doesn't matter whichCi sequence you pick, the limit will exist and be same for all of them, so K:∞ actually exists and is well-defined.
Part 6: Using parts 2 and 5, clean up the normalization, monotonicity, concavity, and C-additivity properties of K:∞. Showing that all the K:∞(x0) are C-additive trivially nets the bounded Lipschitz constant property to show that K:∞ is an infrakernel and K:∞(x0) is an infradistribution.
Part 7: Use our trick from Part 4 and our freedom of picking our compact set sequence from Part 5 to show compact-shared compact almost-support for K:∞, netting us the second infrakernel property, and the compact almost-support property for all the individual components of kernel, verifying the last condition we need to conclude that K:∞(x0) is an infradistribution.
Part 8: We recap one of the arguments for part 5, and it lets us get uniform convergence for a certain limit on any compact set, which is a critical lemma for Part 9.
Part 9: We use our result from Part 8 to invoke the Moore-Osgood theorem in order to show pointwise convergence for K:∞, wrapping up the last condition for it to be an infrakernel.
Part 10: Show that if all the K:n have some nice property, then the limit K:∞ inherits it too.
The proofs will proceed in a strange way to keep track of all the moving parts in places. We'll first present the thing we're trying to prove, and repeatedly go "we could prove it if we could prove this other thing", and keep chaining back until we get something that's easy to show.
Proof Part 1: Our desired result is whether the function λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞) is continuous. So, letting xm1:n+1 limit to x∞1:n+1, our task is to show that:
Now, what we can do is consider the compact subset of ∏i=∞i=1Xi to be {xm1:n+1}m∈N∪{∞}×∏∞i=n+2Ci
And then f must be uniformly continuous on it, so given any ϵ, there is some δ where points only δ away lead to only an ϵ differ in value. You can consider m to be big enough to guarantee that all future values of xm1:n+1 are within δ of x∞1:n+1, and then this gets that the function values can only differ by ϵ between (xm1:n+1,xn+2:∞) and (x∞1:n+1,xn+2:∞) if xn+2:∞∈∏∞i=n+2Ci, which it is. This ensures that the worst-case function values are only ϵ apart. This works for all ϵ, showing
And so, all the functions we're feeding into the K:n(x0) are continuous.
Proof Part 2: Desired result is "if all the Kn have a nice property, then all the K:n have it too".
This can be simply addressed by noting that, for the base case, because K:0=K0 and we're assuming all the Kn have (C-additivity/cohomogenity/homogenity/crispness/sharpness), K:0 trivially fulfills it.
And for the induction step, if we assume that K:n is 1-Lipschitz, note that:
K:n+1(x0)=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
And, by our results on semidirect products preserving nice properties, if K:n(x0) has the nice property (by induction assumption) and Kn+1 does, then we get that K:n+1(x0) preserves the same property, and it holds all the way up the K:n. And we can move on to Part 3.
Part 3: Showing that, if we go far enough out in the K:n, the value assigned to functions which only depend on finitely many inputs stabilizes. The result that we'd like to show at this point is:
Admittedly, f is not of the proper type signature to be evaluated by K:n+m(x0), but we're abusing notation so that we can feed it in anyways and it just ignores all the coordinates it doesn't need. Accordingly, fix an arbitrary x0,n,f, and our proof target will now be:
∀m∈N:K:n+m(x0)(f)=K:n+m+1(x0)(f)
Proving this would let you apply induction, because we have a base case where K:n+0(x0)(f)=K:n(x0)(f). Let m be arbitrary. Then, we can go:
This is a bit complicated. It's saying that if you pick any compact subset of X0, you can make a compact subset of ∏∞i=1Xi where the projection of it to coordinates 1 through n+1 acts as a compact ϵ(1−12n+1)-almost-support for all the K:n(x0) infradistributions when x0 lies in your compact subset of X0. Regardless of what n is.
Accordingly, fix some CX0 and ϵ. Now, we can recursively build up compact subsets of all the Xn in the following way.
CXn+1ϵ2n+1⊆Xn+1 is a ϵ2n+1-almost-support for all the Kn(x1:n) where x1:n∈CX0×∏i=ni=1CXiϵ2i. So, basically, we're recursively building up compact subsets of ∏i=ni=0Xi by taking products of earlier compact subsets (with your base case being CX0), and then going "that's a compact subset of the input to Kn, we must be able to find a compact subset of Xn+1 that's a ϵ2n+1-almost-support for all the Kn(x:n) where x:n lies in our compact subset of input, because of the compact-shared almost-support condition for all the Kn" to go to the next compact set.
To establish some notation to make this a bit easier, let
Ci[CX0,ϵ]:=CXiϵ2i+1
(the i'th compact set in the sequence, defined with CX0 to start building your sequence), and let
C1:n[CX0,ϵ]:=∏i=ni=1Ci[CX0]
(the product of compact sets 1 through n, which is compact)
And let
C1:∞[CX0,ϵ]:=∏∞i=1Ci[CX0]
This is the product of all the compact sets, and is compact.
Note the dependence of these on the starting compact sets. Notice that the projection of C1:∞[CX0,ϵ] to coordinates 1 through n is exactly C1:n[CX0,ϵ].
Now that this is established, our proof target is (using our new notation):
Using that K:0=K0 and that C1[CX0,ϵ]=CX1ϵ2 and ϵ(1−12)=ϵ2 our proof target is now:
∀x0∈CX0,f,f′∈CB(X1):
f↓CX1ϵ2=f′↓CX1ϵ2→|K0(x0)(f)−K0(x0)(f′)|≤ϵ2d(f,f′)
However, we constructed CX1ϵ2 to be a ϵ2-almost-support for all the K0(x0) where x0∈CX0, so this statement is just true, and we're done with our base case.
Therefore let x0,f,f′ be arbitrary, and remember that they have the indicated properties, and that f,f′ agree with each other on the indicated set C1:n+2[CX0,ϵ]. Our proof target is now:
|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Unpacking the definition of K:n+1 and rewriting the thing on the end, this is equivalent to (we now take this as the proof target)
We can apply the Lemma 2 decomposition, to split this into "level of support of compact set x distance of functions + distance of functions on compact set x lipschitz constant of infradistribution". So, theoretically, if we had the following two results:
Lemma 3: If F is a continuous function of type X→K(Y), where K(Y) is the space of nonempty compact subsets of the space Y, then given any compact set CX⊆X, ⋃x∈CXF(x) will be compact in Y.
Fix some compact set CX⊆X, and continuous function F:X→K(Y). We will operate by taking an arbitrary open cover of ⋃x∈CXF(x) and finding a finite subcover.
Let {Oi}i∈I be an open cover of ⋃x∈CXF(x). The Oi are subsets of Y. The topology compatible with Hausdorff distance on K(Y) (space of compact subsets of Y) is the Vietoris topology, where the basis opens are given by finite collections of open sets in Y. You take the set of all compact subsets of Y which are subsets of the union of your finite collection of open sets, and intersect every open set in your finite collection
Accordingly, let J=Pfin(I) (the set of all finite subsets of I, the index set for our open cover), and fix a collection of open sets in K(Y), {Oj}j∈J. The sets Oj are defined as:
Oj:={CY∈K(Y)|CY⊆⋃i∈jOi∧∀i∈j:CY∩Oi≠∅}
Now, all the F(x) with x∈CX are compact (F produces compact sets as output), and they are all subsets of ⋃x∈CXF(x), so {Oi}i∈I is a cover of F(x), and due to its compactness we can identify a finite subcover, and prune away every open st which doesn't intersect F(x). F(x) is a subset of the union of those finitely many open sets, and intersects all of them, so the point F(x)∈K(Y) lies in the open set Oj induced by that finite cover of open sets.
This argument works for arbitrary F(x) with x∈CX, so the collection {Oj}j∈J is an open cover of F(CX). Also, because F is continuous and CX is compact, F(CX) is compact, so we can identify a finite subcover from {Oj}j∈J.
Then, consider the collection of open sets Oi where i∈j for some Oj which is part of the finite cover of F(CX). This is finitely many opens, we're unioning together finitely many (finitely many Oj selected) finite sets of open sets (each Oj is associated with finitely many Oi that it was built from).
Now we just have to show that this collection covers ⋃x∈CXF(x), and we'll have made our finite subcover and shown that said set is compact. Assume our finite collection of opens doesn't cover the set. Then there's some F(x) which wasn't covered completely. However, the point corresponding to F(x) in K(Y) lies in some Oj, and from its definition, the corresponding Oi manage to cover F(x), and we have a contradiction. We're done.
Proposition 19: h⋉K is an infradistribution, and preserves all properties indicated in the diagram at the start of this section if h and all the K(x) have said property.
To show this, we'll verify that it's well-defined at all, normalization, monotonicity, concavity, Lipschitzness, compact almost-support, and preservation of the properties.
(h⋉K)(f):=h(λx.K(x)(λy.f(x,y)))
Our first order of business is verifying that
λx.K(x)(λy.f(x,y))
is even a continuous function to be able to show that h can accept it as input.
For continuity, let xn limit to x, and we'll try to show that K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). Let λ⊙K be the Lipschitz constant upper bound of K.
Pick an ϵ, we'll show that there's some m0 where
∀n∀m≥m0:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)
First, note that {xn}n∈N∪{x} is a compact set because xn limits to x. Thus, by the compact-shared compact almost-support condition on an infrakernel, there must be some compact set Cϵ⊆Y where all the K(xn) agree that functions f,f′ agreeing on Cϵ have values only ϵd(f,f′) apart from each other.
Now, because f is a continuous bounded function X×Y→R, it's uniformly continuous when restricted to
({xn}n∈N∪{x})×Cϵ
as this is the product of two compact sets and is compact. Due to the uniform continuity of f restricted to that set, there is some number δ where points only δ apart in that set have their values only differing by ϵ. Further, there is some number m0 where, for all m≥m0, d(xm,x)<δ.
Additionally, the maximum difference between λy.f(x,y) and λy.f(x′,y) is 2||f||.
Now that we know our number m0 we can pick an arbitrary m above it, and go:
∀m≥m0∀y∈Cϵ:d((xm,y),(x,y))=d(xm,x)≤δ
∀m≥m0∀y∈Cϵ:|f(xm,y)−f(x,y)|≤ϵ
∀m≥m0:d((λy.f(xm,y))↓Cϵ,(λy.f(x,y))↓Cϵ)≤δ
And now, because these two functions restricted to Cϵ are only ϵ apart, we can apply Lemma 2 to conclude that (since Cϵ and λ⊙K work for all the K(xn))
∀n:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ⋅2||f||+ϵλ⊙K
This argument works for any m≥m0, so we have:
∃m0∀n∀m≥m0:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)
Letting n=m in particular,
∃m0∀n≥m0:|K(xn)(λy.f(xn,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)
And for each \eps we can construct a m_0 in this way, concluding that
limn→∞|K(xn)(λy.f(xx,y))−K(xn)(λy.f(x,y))|=0
Also, from our pointwise convergence condition on infradistributions,
limn→∞K(xn)(λy.f(x,y))=K(x)(λy.f(x,y))
Therefore,
limn→∞K(xn)(λy.f(xn,y))=K(x)(λy.f(x,y))
and so, we now know that
λx.K(x)(λy.f(x,y))
is a continuous function X→R. For boundedness, upper and lower bounds on λy.f(x,y) are ||f|| (and the negative version of it). Due to the shared Lipschitz constant on the K(x), an upper and lower-bound on λx.K(x)(λy.f(x,y)) is λ⊙K||f|| (and the negative version.) Thus, we can safely feed said function into the infradistribution h, so the semidirect product is well-defined. We must still show that it makes an infradistribution.
For normalization,
(h⋉K)(1)=h(λx.K(x)(λy.1))=h(λx.1)=1
(h⋉K)(0)=h(λx.K(x)(λy.0))=h(λx.0)=0
For monotonicity, if f′≥f,
∀x:λy.f′(x,y)≥λy.f(x,y)
∀x:K(x)(λy.f′(x,y))≥K(x)(λy.f(x,y))
λx.K(x)(λy.f′(x,y))≥λx.K(x)(λy.f(x,y))
(h⋉K)(f′)=h(λx.K(x)(λy.f′(x,y)))≥h(λx.K(x)(λy.f′(x,y)))=(h⋉K)(f)
For concavity,
(h⋉K)(pf+(1−p)f′)=h(λx.K(x)(λy.pf(x,y)+(1−p)f′(x,y)))
≥h(λx.pK(x)(λy.f(x,y))+(1−p)K(x)(λy.f′(x,y)))
≥ph(λx.K(x)(λy.f(x,y)))+(1−p)h(λx.K(x)(λy.f′(x,y)))
=p(h⋉K)(f)+(1−p)(h⋉K)(f′)
In order, this was the definition of the semidirect product, all the K(x) being concave so splitting them up produces a lower value (and then monotonicity for h), then h being concave.
This leaves Lipschitzness and CAS. For Lipschitzness, given some f and f′, and letting λ⊙h be the Lipschitz constant of h, we have:
|(h⋉K)(f)−(h⋉K)(f′)|=|h(λx.K(x)(λy.f(x,y)))−h(λx.K(x)(λy.f′(x,y)))|
≤λ⊙hd(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))
=λ⊙hsupx|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|
≤λ⊙hsupxλ⊙Kd(λy.f(x,y),λy.f′(x,y))=λ⊙hλ⊙Ksupxd(λy.f(x,y),λy.f′(x,y))
≤λ⊙hλ⊙Kd(f,f′)
Thus, that final thing shows that there's a finite Lipschitz constant for h⋉K.
This leaves compact almost-support. Pick any ϵ. This induces a compact set CXϵ which is an ϵ-almost-support for h, and then this compact set induces a compact set CYϵ which an ϵ-almost-support for all the K(x) where x∈CXϵ. Now, we can apply Lemma 2 to go:
|h(λx.K(x)(λy.f(x,y)))−h(λx.K(x)(λy.f′(x,y)))|
≤ϵd(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))
+λ⊙hd((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)
Pretty much, that first part is the "CXϵ is an ϵ-almost-support for h" piece, and the second piece is the "hey, these two functions may be a bit different on said compact set, we've gotta multiply that by the Lipschitz constant" piece. So, let's work on unpacking these two distances. For the first one, we can go:
d(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))
=supx|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|
≤supxλ⊙Kd(λy.f(x,y),λy.f′(x,y))
=λ⊙Ksupxd(λy.f(x,y),λy.f′(x,y))
=λ⊙Ksupxsupy|f(x,y)−f′(x,y)|
=λ⊙Ksupx,y|f(x,y)−f′(x,y)|=λ⊙Kd(f,f′)
Substituting this back in produces:
≤ϵλ⊙Kd(f,f′)+λ⊙hd((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)
Time to go after the second distance piece. We have:
d((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)
=supx∈CXϵ|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|
And, because f and f′ agree on CXϵ×CYϵ, we have λy.f(x,y) and λy.f′(x,y) agreeing on CYϵ, which is an ϵ-almost-support for all the K(x) where x∈CXϵ, so we have:
≤supx∈CXϵϵd(λy.f(x,y),λy.f′(x,y))
=ϵsupx∈CXϵsupy|f(x,y)−f′(x,y)|
≤ϵsupx,y|f(x,y)−f′(x,y)|=ϵd(f,f′)
Substituting this back in produces:
≤ϵλ⊙Kd(f,f′)+ϵλ⊙hd(f,f′)
And regrouping this and recapping means that we have:
|(h⋉K)(f)−(h⋉K)(f′)|≤ϵ(λ⊙K+λ⊙h)d(f,f′)
So we have crafted a compact ϵ(λ⊙K+λ⊙h)-support for h⋉K, and we can make ϵ arbitrarily small, so the semidirect product has compact almost-support, which is the last condition we needed.
Time for property verification.
Homogenity:
(h⋉K)(af)=h(λx.K(x)(λy.af(x,y)))=h(λx.aK(x)(λy.f(x,y)))
=ah(λx.K(x)(λy.f(x,y)))=a(h⋉K)(f)
1-Lipschitz: We showed in the Lipschitz section that an upper bound on the Lipschitz constant of h⋉K is the product of the Lipschitz constants of the kernel and the original infradistribution, so 1⋅1=1 and 1-Lipschitzness is preserved.
Cohomogenity:
(h⋉K)(1+af)=h(λx.K(x)(λy.1+af(x,y)))=h(λx.1−a+aK(x)(λy.1+f(x,y)))
=h(λx.1+a(−1+K(x)(λy.1+f(x,y))))=1−a+ah(λx.1−1+K(x)(λy.1+f(x,y))))
=1−a+ah(λx.K(x)(λy.1+f(x,y))))=1−a+a(h⋉K)(1+f)
C-additivity:
(h⋉K)(c)=h(λx.K(x)(λy.c))=h(λx.c)=c
Crispness: Both homogenity and C-additivity are preserved, so crispness is too.
Sharpness:
(h⋉K)(f)=h(λx.K(x)(λy.f(x,y)))=h(λx.infy∈CK(x)f(x,y))
=infx∈Ch(infy∈CK(x)f(x,y))=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)
Our task now is to show that ⋃x∈Ch({x}×CK(x)) is compact, which will take a fair amount of topology work. Our first piece that we'll need is that if xn limits to x, then CK(xn) limits to CK(x) in Hausdorff-distance.
To show this, we'll split it into two parts. First, we'll assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x) and disprove that. Second, we'll assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn), and disprove that.
For the first part, assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). Craft the continuous function
f1:=λy.sup(1−1ϵinfy′∈CK(x)d(y,y′),0)
What this does is it's 1 on the set CK(x), and 0 on anything more than ϵ away from it. One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), so:
limn→∞infy∈CK(xn)f1(y)=infy∈CK(x)f1(y)
The latter term is 1 because f1 is 1 over CK(x). However, because we're assuming that infinitely often, there's a point in CK(xn) that is ϵ away from CK(x), the sequence on the left-hand side is infinitely often 0, so it doesn't converge and we have a contradiction.
For the second part, assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). By compactness of CK(x), we can find finitely many points yi in it s.t. every point in CK(x) is only ϵ2 away from one of the yi (cover CK(x) with ϵ2-size open balls centered on points in it and take a finite subcover). Now, for each of these, we can craft a function
fi:=λy.inf(1,2ϵd(y,yi))
So, this is 0 at the point yi, and 1 at any distance ϵ2 or more away from it.
One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), and there are finitely many fi, so there's some time where all of them nearly converge, ie:
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
However, infinitely often there's a point yn∈CK(x) that is ϵ away from CK(xn). yn is ϵ2 away from some yi, so that yi can't be closer than ϵ2 to CK(xn). (if it was closer, then we could pick some point in CK(xn) that's closer than ϵ2 to yi, and then since it's only ϵ2 away from yn, we'd have that the distance from yn to CK(xn) is below ϵ2, an impossibility).
Because the distance from yi to any point in CK(xn) is above ϵ2, then
|K(xn)(fi)−K(x)(fi)|=|infy∈CK(xn)fi(y)−infy∈CK(x)fi(y)|=|1−0|=1
This is because yi∈CK(x) and attains a value of 0 according to fi, while CK(xn) stays away from yi and all its points must have a value of 1. This situation happens infinitely often, which leads to a contradiction with
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
Because infinitely often, one of these fi has very different values, so the sequence is 1 infinitely often and can't limit to 0.
So, we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). And we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). Fixing any ϵ, in the tail of the sequence, CK(x) and CK(xn) are ϵ distance or closer in Hausdorff distance because you can't find points in either set which are far away from the other set. So, CK(xn) limits to CK(x) in Hausdorff-distance when xn limits to x, and we know that x↦CK(x) is a continuous function X→K(Y).
This lets us show that the set
⋃x∈Ch({x}×CK(x))
is closed, because if xn limits to x and yn∈CK(xn) and yn limits to y, we have that y∈CK(x) because CK(xn) limits to CK(x) in Hausdorff distance, so we've got closed graph.
Also, by invoking Lemma 3, we know that
⋃x∈ChCK(x)
is compact.
Time to wrap this all up. We know that ⋃x∈Ch{x}×CK(x) is closed in X×Y from our Hausdorff limit argument. This set is also a subset of:
Ch×⋃x∈ChCK(x)
Which is a product of two sets known to be compact, and is compact. It's a closed subset of a compact set, so it's compact. Therefore,
⋃x∈Ch{x}×CK(x)
is a compact set, and from way back,
(h⋉K)(f)=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)
And we've shown that set is compact, so h⋉K where h and all the K(x) are sharp can be written as minimizing over a compact set, so h⋉K is sharp. Thus, semidirect product preserves all the nice properties, and we're finally done with this proof.
Proposition 20: If all the K(x) are C-additive, then prX∗(h⋉K)=h.
prX∗(h⋉K)(f)=(h⋉K)(f∘prX)=h(λx.K(x)(λy.f(prX(x,y))))
=h(λx.K(x)(λy.f(x)))=h(λx.f(x))=h(f)
This is because, since f(x) doesn't depend on y, it acts as a constant inside K(x) and C-additivity lets us pull it out.
Proposition 21: If K0,K1,K2... are a sequence of infrakernels of type Kn:∏i=ni=0Xiik→Xn+1, and h is an infradistribution over X0, then (...((h⋉K0)⋉K1)...⋉Km) can be rewritten as h⋉K:m where K:n is an infrakernel of type X0ik→∏i=n+1i=1Xi, recursively defined as K:0:=K0 and K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
So, for our inductive definition,
K:0(x0):=K0(x0)
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)
Our task is to show that these are all infrakernels, by induction, and that for any infradistribution h,
(...((h⋉K0)⋉K1)...⋉Kn)=h⋉K:n
For the base case, we observe that K:0 is an infrakernel because it equals K0, which is an infrakernel, and that h⋉K0=h⋉K:0
Time for the induction step. We'll assume that K:n is an infrakernel, and show that K:n+1 is. Further, we need to show that h⋉K:n+1=(h⋉K:n)⋉Kn+1. This will show the result.
Our first requirement is showing that for all x0, K:n+1(x0) is an infradistribution.
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
By our induction assumption, K:n(x0) is an infradistribution as K:n is an infrakernel. Further, λx1:n+1.Kn+1(x0,x1:n+1) is an infrakernel because Kn+1 is and we're just restricting it to a subset of its domain, so it keeps being an infrakernel. And we know from earlier that the semidirect product of an infradistribution and an infrakernel is an infradistribution. So that's taken care of.
Now, we must show a common Lipschitz constant, pointwise function convergence, and compact-shared compact almost-support for K:n+1 to certify that it's an infrakernel.
Starting with common Lipschitz constant, we can just note that, in our proof of Proposition 19, we saw that the Lipschitz constant of the semidirect product was upper-bounded by the product of the Lipschitz constants of the starting infradistributions and the kernel. Assuming that K:n is an infradistribution, we have that the Lipschitz constant of any K:n(x0) is upper-bounded by some λ⊙:n Lipschitz constant. Also, the Lipschitz constant of Kn+1(x0,x1:n+1) is upper-bounded by some λ⊙n+1 Lipschitz constant. Thus, λ⊙:nλ⊙n+1 is an upper-bound on the Lipschitz constant of any
K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
infradistribution, which is exactly K:n+1(x0), witnessing that K:n+1 has a uniform upper bound on its Lipschitz constants.
Time to move onto the second one, compact-shared compact almost-support.
For this one, we're trying to prove:
∀CX0,ϵ∃C∏i=n+2i=1Xiϵ⊆∏i=n+2i=1Xi∀x0∈CX0,f,f′:
f↓C∏i=n+2i=1Xiϵ=f′↓C∏i=n+2i=1Xiϵ→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵd(f,f′)
This is the sentence that says that K:n+1 has compact-shared compact almost-support. f and f′ have type signature ∏i=n+2i=1Xi→R.
Now, this is going to be quite complicated, so pay close attention. Fix an arbitrary compact CX0⊆X0, and an arbitrary ϵ. Let λ⊙:n be the Lipschitz constant for the infrakernel K:n, and λ⊙n+1 be the Lipschitz constant for the infrakernel Kn+1.
Due to compact-shared compact-almost-support for K:n which exists by our induction assumption, your set CX0 induces a compact ϵ2λ⊙n+1-almost-support for the family of infradistributions K:n(x0) where x0∈CX0. Call said almost-support C∏i=n+1i=1Xiϵ2λ⊙n+1.
Further, due to compact-shared compact-almost-support for Kn+1 , the set
CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
induces a compact ϵ2λ⊙:n-almost-support for the family of infradistributions Kn+1(x0,x1:n+1) where (x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
Call said almost-support CXn+2ϵ2λ⊙:n
And now let your shared ϵ-almost-support for K:n+1(x0) where x0∈CX0 be:
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
We must show that said set is indeed a shared ϵ-almost-support for K:n+1(x0) where x0∈CX0. So, let f and f′ agree on said set. Then, we have:
|K:n+1(x0)(f)−K:n+1(x0)(f′)|
=|(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(f)−(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(f′)|
=|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|
This is just unpacking the definition of the iterated semidirect product, no issues here. Now, we use Lemma 2 and the fact that C∏i=n+1i=1Xiϵ2λ⊙n+1 is a ϵ2λ⊙n+1-almost-support for K:n(x0) when x0∈CX0, to get:
≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ2λ⊙n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
+ϵ2λ⊙n+1supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
Ok, this is a mess. Let's try to unpack
supx1:n+1∈C∏i=n+1i=1Xiϵ2λ⊙n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
first. What we can do is use that, regardless of what is picked in the supremum, we have:
(x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
So this means that
CXn+2ϵ2λ⊙:n
is a ϵ2λ⊙:n-almost-support for Kn+1(x0,x1:n+1). Further, because f and f′ are identical on
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
and x1:n+1 was being selected from the former of those, then the functions λxn+2.f(x1:n+1,xn+2) (and the same for f′) agree on CXn+2ϵ2λ⊙:n, the almost-support. So, the supremum is upper-bounded by
≤ϵ2λ⊙:nd(f,f′)
Substituting this back in, we get:
≤λ⊙:nϵ2λ⊙:nd(f,f′)+ϵ2λ⊙n+1supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
Now let's try to unpack
supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
We don't have much leverage on it, besides using the basic Lipschitz constant upper bound, so let's try that.
≤supx1:n+1λ⊙n+1d(λxn+2.f(x1:n+1,xn+2),λxn+2.f′(x1:n+1,xn+2))
=λ⊙n+1d(f,f′)
And substituting this back in, we get:
≤λ⊙:nϵ2λ⊙:nd(f,f′)+ϵ2λ⊙n+1λ⊙n+1d(f,f′)
=ϵd(f,f′)
And so we've shown that the functions are only ϵ times their distance apart, so the compact set we cooked up is indeed an ϵ-almost-support for K:n+1(x0) whenever x0∈CX0, and because ϵ and CX0 was arbitrary, we have compact-shared compact-almost-support for K:n+1.
Time to move onto the third one, pointwise convergence. If x0,m limits to x0,∞, we want K:n+1(x0,m)(f) to limit to K:n+1(x0,∞)(f). As usual, we use λ⊙n+1 for the Lipschitz constant of Kn+1 and λ⊙:n for the Lipschitz constant of K:n.
To begin with, fix an arbitrary ϵ and bounded continuous function f, and note that {x0,m}m∈N∪{∞} is a compact subset of X0. Because K:n:X0ik→∏i=n+1i=1Xi is assumed to be an infrakernel by induction, {x0,m}m∈N∪{∞} acts as a compact set for it. So, by compact-shared compact-almost-support for K:n, we can find a compact set C∏i=n+1i=1Xiϵ4λ⊙n+1||f|| which is a ϵ4λ⊙n+1||f||-almost-support for K:n.
Also, it is important to note that
λx:n+1.Kn+1(x:n+1)(λxn+2.f(x:n+1,xn+2))
Is a continuous function (as it must be for semidirect products with Kn+1 to have the functions on the inside be continuous). Accordingly, this means that the function:
λx0,x1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
must be uniformly continuous when restricted to the set {x0,m}m∈N∪{∞}×C∏i=n+1i=1Xiϵ4λ⊙n+1||f||
And so, by uniform continuity, given any ϵ, there is some δ difference in inputs which gives rise to a ϵ2λ⊙:n difference in output.
Now, here's what we'll be doing. We'll attempt to show the result that
∀ϵ∃m∗∀m≥m∗:|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|≤ϵ
Straight off the bat, we can apply Lemma 2 to decompose this difference into "starting Lipschitz constant times the difference of the inner functions on the compact set of interest" and "level of almost-support times the difference of the two functions", yielding:
|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|
≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ4λ⊙n+1||f|||Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
+ϵ4λ⊙n+1||f||supx1:n+1|Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
Time to start breaking this down. First, to break down
supx1:n+1|Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
we can realize that the maximum value of one of these would be λ⊙n+1||f||, and the minimum possible value of one of these is −λ⊙n+1||f||, from Lipschitzness of Kn+1, producing an upper bound of:
≤2λ⊙n+1||f||
Substituting this back in, we have:
≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ4λ⊙n+1||f|||Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|+ϵ4λ⊙n+1||f||2λ⊙n+1||f||
And now, we can use the fact that there is always some δ difference in inputs which gives rise to a ϵ2λ⊙:n difference in output of the function
λx0,x1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
when restricted to the set {x0,m}m∈N∪{∞}×C∏i=n+1i=1Xiϵ4λ⊙n+1||f||
in order to find some m∗ where all future m have x0,m being within δ of x0,∞.
This tiny difference means that the values
Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
and
Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
will only differ by ϵ2λ⊙:n for all x1:n+1 which lie in
C∏i=n+1i=1Xiϵ4λ⊙n+1||f||
Therefore, we have that for all m past m∗,
supx1:n+1∈C∏i=n+1i=1Xiϵ4λ⊙n+1||f|||Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|≤ϵ2λ⊙:n
And substituting this back in, we have:
≤λ⊙:nϵ2λ⊙:n+ϵ4λ⊙n+1||f||2λ⊙n+1||f||=ϵ
And ϵ was arbitrary. Therefore we have our desired result that, regardless of bounded continuous function f,
∀ϵ∃m∗∀m≥m∗:|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|≤ϵ
Therefore,
limm→∞|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|=0
These two things limit increasingly close to each other. Further,
limm→∞K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
=K:n(x0,∞)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
By pointwise convergence for K:n which is an infrakernel by our induction assumption. Putting these two parts together, we have:
limm→∞K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
=K:n(x0,∞)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
so
limm→∞(K:n(x0,m)⋉(λx1:n+1.Kn+1(x0,m,x1:n+1)))(f)
=(K:n(x0,∞)⋉(λx1:n+1.Kn+1(x0,∞,x1:n+1)))(f)
so
limm→∞K:n+1(x0,m)(f)=K:n+1(x0,∞)(f)
And we're done, we showed pointwise convergence for K:n+1 which is the last condition necessary to show it's an infrakernel, and the induction proof goes through to show that all the K:n are infrakernels.
Now all that's left is to show that
h⋉K:n+1=(h⋉K:n)⋉Kn+1
using induction, we have the base case set up. We can go:
(h⋉K:n+1)(f)
=h(λx0.K:n+1(x0)(λx1:n+2.f(x0,x1:n+2)))
=h(λx0.(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(λx1:n+2.f(x0,x1:n+2)))
=h(λx0.K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x0,x1:n+1,xn+2))))
=(h⋉K:n)(λx:n+1.Kn+1(x:n+1)(λxn+2.f(x:n+1,xn+2)))
=((h⋉K:n)⋉Kn+1)(f)
And we're done. Because
h⋉K:n+1=(h⋉K:n)⋉Kn+1
and we know that h⋉K:0=h⋉K0, this means that
∀m:(...((h⋉K0)⋉K1)...⋉Km)=h⋉K:m
Proposition 22: K:∞ is an infrakernel (C-additive, specifically) if all the Kn are C-additive infrakernels. It is unchanged by altering the Ci sequence of compact sets. In addition, if all the Kn are homogenous/cohomogenous/crisp/sharp, then K:∞ will be so as well.
So, K:∞:X0ik→∏∞i=1Xi is defined as: Fixing an arbitrary sequence of compact sets Ci∈Xi,
K:∞(x0)(f):=limn→∞K:n(x0)(λx1:n+1infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
Is it an infrakernel?
This is going to suck unbelievably much, we're gonna need a ton of results. The game plan is:
Part 1: Show that the functions you're feeding into those infrakernels are guaranteed to be continuous, to make some progress towards showing that K:∞ is well-defined.
Part 2: Show that all the K:n are 1-Lipschitz, and also preserve all nice properties we'd want if all the Kn do (homogenity, cohomogenity, C-additity, crispness, sharpness).
Part 3: Show that if a function only depends on the first n coordinates of the input, then all the K:n+m start agreeing on the expectation value of the function.
Part 4: Give a general procedure for taking a compact subset of the space X0 and making a compact subset of the space ∏∞i=1Xi with nice properties related to compact almost-support, that preserves its nice properties when projected down to any finite stage.
Part 5: Use parts 2, 3, 4, and a complicated chain of reasoning to get a result which implies that it doesn't matter which Ci sequence you pick, the limit will exist and be same for all of them, so K:∞ actually exists and is well-defined.
Part 6: Using parts 2 and 5, clean up the normalization, monotonicity, concavity, and C-additivity properties of K:∞. Showing that all the K:∞(x0) are C-additive trivially nets the bounded Lipschitz constant property to show that K:∞ is an infrakernel and K:∞(x0) is an infradistribution.
Part 7: Use our trick from Part 4 and our freedom of picking our compact set sequence from Part 5 to show compact-shared compact almost-support for K:∞, netting us the second infrakernel property, and the compact almost-support property for all the individual components of kernel, verifying the last condition we need to conclude that K:∞(x0) is an infradistribution.
Part 8: We recap one of the arguments for part 5, and it lets us get uniform convergence for a certain limit on any compact set, which is a critical lemma for Part 9.
Part 9: We use our result from Part 8 to invoke the Moore-Osgood theorem in order to show pointwise convergence for K:∞, wrapping up the last condition for it to be an infrakernel.
Part 10: Show that if all the K:n have some nice property, then the limit K:∞ inherits it too.
The proofs will proceed in a strange way to keep track of all the moving parts in places. We'll first present the thing we're trying to prove, and repeatedly go "we could prove it if we could prove this other thing", and keep chaining back until we get something that's easy to show.
Proof Part 1: Our desired result is whether the function λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
is continuous. So, letting xm1:n+1 limit to x∞1:n+1, our task is to show that:
limm→∞infxn+2:∞∈∏∞i=n+2Cif(xm1:n+1,xn+2:∞)=infxn+2:∞∈∏∞i=n+2Cif(x∞1:n+1,xn+2:∞)
Now, what we can do is consider the compact subset of ∏i=∞i=1Xi to be
{xm1:n+1}m∈N∪{∞}×∏∞i=n+2Ci
And then f must be uniformly continuous on it, so given any ϵ, there is some δ where points only δ away lead to only an ϵ differ in value. You can consider m to be big enough to guarantee that all future values of xm1:n+1 are within δ of x∞1:n+1, and then this gets that the function values can only differ by ϵ between (xm1:n+1,xn+2:∞) and (x∞1:n+1,xn+2:∞) if xn+2:∞∈∏∞i=n+2Ci, which it is. This ensures that the worst-case function values are only ϵ apart. This works for all ϵ, showing
limm→∞infxn+2:∞∈∏∞i=n+2Cif(xm1:n+1,xn+2:∞)=infxn+2:∞∈∏∞i=n+2Cif(x∞1:n+1,xn+2:∞)
And so, all the functions we're feeding into the K:n(x0) are continuous.
Proof Part 2: Desired result is "if all the Kn have a nice property, then all the K:n have it too".
This can be simply addressed by noting that, for the base case, because K:0=K0 and we're assuming all the Kn have (C-additivity/cohomogenity/homogenity/crispness/sharpness), K:0 trivially fulfills it.
And for the induction step, if we assume that K:n is 1-Lipschitz, note that:
K:n+1(x0)=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
And, by our results on semidirect products preserving nice properties, if K:n(x0) has the nice property (by induction assumption) and Kn+1 does, then we get that K:n+1(x0) preserves the same property, and it holds all the way up the K:n.
And we can move on to Part 3.
Part 3: Showing that, if we go far enough out in the K:n, the value assigned to functions which only depend on finitely many inputs stabilizes. The result that we'd like to show at this point is:
∀x0∈X0,n,m∈N,f∈CB(∏i=n+1i=1Xi):K:n+m(x0)(f)=K:n(x0)(f)
Admittedly, f is not of the proper type signature to be evaluated by K:n+m(x0), but we're abusing notation so that we can feed it in anyways and it just ignores all the coordinates it doesn't need. Accordingly, fix an arbitrary x0,n,f, and our proof target will now be:
∀m∈N:K:n+m(x0)(f)=K:n+m+1(x0)(f)
Proving this would let you apply induction, because we have a base case where K:n+0(x0)(f)=K:n(x0)(f). Let m be arbitrary. Then, we can go:
K:n+m+1(x0)(f)=K:n+m(x0)(λx1:n+m+1.Kn+m+1(x0,x1:n+m+1)(λxn+m+2.f(x1:n+1)))
And then, since the function doesn't depend on the choice of xn+m+2, it's a constant and C-additivity of Kn+m+1 lets us pull it out, yielding
=K:n+m(x0)(λx1:n+m+1.f(x1:n+1))=K:n+m(x0)(f)
And we're done.
Part 4: Our desired result here is:
∀CX0⊆X0,ϵ>0:∃C1:∞[CX0,ϵ]⊆∏∞i=1Xi:∀n∈N,x0∈CX0,f,f′∈CB(∏n+1i=1Xi):
f↓pr1:n+1(C1:∞[CX0,ϵ])=f′↓pr1:n+1(C1:∞[CX0,ϵ])
→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
This is a bit complicated. It's saying that if you pick any compact subset of X0, you can make a compact subset of ∏∞i=1Xi where the projection of it to coordinates 1 through n+1 acts as a compact ϵ(1−12n+1)-almost-support for all the K:n(x0) infradistributions when x0 lies in your compact subset of X0. Regardless of what n is.
Accordingly, fix some CX0 and ϵ. Now, we can recursively build up compact subsets of all the Xn in the following way.
CXn+1ϵ2n+1⊆Xn+1 is a ϵ2n+1-almost-support for all the Kn(x1:n) where x1:n∈CX0×∏i=ni=1CXiϵ2i. So, basically, we're recursively building up compact subsets of ∏i=ni=0Xi by taking products of earlier compact subsets (with your base case being CX0), and then going "that's a compact subset of the input to Kn, we must be able to find a compact subset of Xn+1 that's a ϵ2n+1-almost-support for all the Kn(x:n) where x:n lies in our compact subset of input, because of the compact-shared almost-support condition for all the Kn" to go to the next compact set.
To establish some notation to make this a bit easier, let
Ci[CX0,ϵ]:=CXiϵ2i+1
(the i'th compact set in the sequence, defined with CX0 to start building your sequence), and let
C1:n[CX0,ϵ]:=∏i=ni=1Ci[CX0]
(the product of compact sets 1 through n, which is compact)
And let
C1:∞[CX0,ϵ]:=∏∞i=1Ci[CX0]
This is the product of all the compact sets, and is compact.
Note the dependence of these on the starting compact sets. Notice that the projection of C1:∞[CX0,ϵ] to coordinates 1 through n is exactly C1:n[CX0,ϵ].
Now that this is established, our proof target is (using our new notation):
∀n∈N,x0∈CX0,f,f′∈CB(∏n+1i=1Xi):
f↓C1:n+1[CX0,ϵ])=f′↓C1:n+1[CX0,ϵ])→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
This structure naturally suggests an induction proof, so for the base case, let our number be 0. Our proof target then turns into:
∀x0∈CX0,f,f′∈CB(X1):
f↓C1[CX0,ϵ])=f′↓C1[CX0,ϵ])→|K:0(x0)(f)−K:0(x0)(f′)|≤ϵ(1−12)d(f,f′)
Using that K:0=K0 and that C1[CX0,ϵ]=CX1ϵ2 and ϵ(1−12)=ϵ2 our proof target is now:
∀x0∈CX0,f,f′∈CB(X1):
f↓CX1ϵ2=f′↓CX1ϵ2→|K0(x0)(f)−K0(x0)(f′)|≤ϵ2d(f,f′)
However, we constructed CX1ϵ2 to be a ϵ2-almost-support for all the K0(x0) where x0∈CX0, so this statement is just true, and we're done with our base case.
Now, for the induction step, our proof target is:
∀x0∈CX0,f,f′∈CB(∏n+1i=1Xi):f↓C1:n+1[CX0,ϵ]=f′↓C1:n+1[CX0,ϵ]
→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
implies
∀x0∈CX0,f,f′∈CB(∏n+2i=1Xi):f↓C1:n+2[CX0,ϵ]=f′↓C1:n+2[CX0,ϵ]
→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Accordingly, assume
∀x0∈CX0,f,f′∈CB(∏n+1i=1Xi):f↓C1:n+1[CX0,ϵ]=f′↓C1:n+1[CX0,ϵ]
→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
And our task is to prove
∀x0∈CX0,f,f′∈CB(∏n+2i=1Xi):f↓C1:n+2[CX0,ϵ]=f′↓C1:n+2[CX0,ϵ]
→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Therefore let x0,f,f′ be arbitrary, and remember that they have the indicated properties, and that f,f′ agree with each other on the indicated set C1:n+2[CX0,ϵ]. Our proof target is now:
|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Unpacking the definition of K:n+1 and rewriting the thing on the end, this is equivalent to (we now take this as the proof target)
|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|
≤ϵ(1−12n+1)d(f,f′)+ϵ2n+2d(f,f′)
We can apply the Lemma 2 decomposition, to split this into "level of support of compact set x distance of functions + distance of functions on compact set x lipschitz constant of infradistribution". So, theoretically, if we had the following two results:
∀x1:n+1∈C1:n+1[CX0,ϵ]:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤ϵ2n+2d(f,f′)
and
∀x1:n+1:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤d(f,f′)
then applying Lemma 2 would get us
|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|
≤ϵ(1−12n+1