LBIT Proofs 7: Propositions 48-52

Proposition 48: A function $K : X \to □ Y$ is an infrakernel and fulfills the property:
$\forall C_{X} \in K (X) : {K (x) | x \in C_{X}} is b-uniform$
iff the function $K$ is bounded and continuous when $□ Y$ is equipped with the IKR metric.

So, our first direction is to assume that the relevant function is bounded and continuous, and show that it fulfills the infrakernel properties and the one extra compact b-uniformity property. The second direction is to show that, assuming the b-uniformity property and being an infrakernel, the function is bounded and continuous.

Anyways, so let's assume the function $K$ is bounded and continuous. Ie,

${sup}_{x \in X} ({sup}_{f \in C_{B - l i p} (Y)} \frac{| K (x) (f) |}{max (L i (f), | | f | |, 1)}) < \infty$

And, if $x_{n}$ limits to $x$ , then $K (x_{n})$ limits to $K (x)$ in Hausdorff-distance/the infra-KR metric.

First, for purposes of disproof, let's assume $K$ doesn't have bounded Lipschitz constant/amount of measure present. That is,

$\forall λ \exists x \in X, (m, b) \in K (x) : m (1) \geq λ$

Then, you can let $λ$ be any number you want, no matter how large, and find some $x_{λ}$ where $(m_{λ}, b_{λ}) \in K (x_{λ})$ and $m (1) \geq λ$ and $b_{λ} \geq 1$ . Now, let's look at the norm of the infradistribution $K (x_{λ})$ . Plug in the constant function $- b_{λ}$ and we have:

$K (x_{λ}) (- b_{λ}) \leq m_{λ} (- b_{λ}) + b_{λ} \leq - λ b_{λ} + b_{λ}$

This is because $m_{λ}$ had $λ$ or more measure present. Then, making that substitution, we have:

${sup}_{x \in X} ({sup}_{f \in C_{l i p}} \frac{| K (x) (f) |}{max (L i (f), | | f | |, 1)}) \geq {sup}_{f \in C_{l i p}} \frac{| K (x_{λ}) (f) |}{max (L i (f), | | f | |, 1)} \geq \frac{| K (x_{λ}) (- b_{λ}) |}{max (L i (- b_{λ}), | | - b_{λ} | |, 1)}$

And at this point we observe that the Lipschitz constant of a constant is 0, and the norm of a constant is just its absolute value, and $b_{λ} \geq 1$ by assumption, to get:

$= \frac{| K (x_{λ}) (- b_{λ}) |}{b_{λ}}$

Now we use our earlier inequalities and the absolute value (everything in our earlier inequalities is a negative quantity) to get:

$\geq \frac{λ b_{λ} - b_{λ}}{b_{λ}} = λ - 1$

But $λ$ can be unboundedly high, so our maximum norm is infinity, which we stipulated isn't the case. Thus, $K$ does have bounded Lipschitz constant (this was derived from $K$ 's bounded IKR-norm and continuity).

Now, we just need to derive the other two conditions on an infrakernel from boundedness and continuity of $K$ .

If $K$ is a continuous function $X \to □ Y$ , and $C_{X}$ is an arbitrary compact subset of $X$ , then $K (C_{X})$ must be a compact subset of $□ Y$ , which, from our characterization of the iff conditions of compactness in Proposition 46, means that it fulfills the shared compact-almost-support property and the b-uniformity condition. Thus the function $K$ fulfills the compact-shared compact-almost-support property, and the compact b-uniformity condition. Finally, for the pointwise-convergence condition of an infrakernel, continuity of the function $X \to □ Y$ means that if $x_{n}$ limits to $x$ , then $K (x_{n})$ limits to $K (x)$ in Hausdorff-distance, so by Proposition 45, for all continuous bounded functions f, $K (x_{n}) (f)$ limits to $K (x) (f)$ , and this is the last condition we need to call $K$ an infrakernel.

So, bounded and continuous functions $X \to □ Y$ imply that the function is an infrakernel and fulfills the compact b-uniformity condition. Time for the reverse direction. Let's assume that $K$ is an infrakernel $X \to □ Y$ and fufills the compact b-uniformity condition. For showing boundedness, there's some Lipschitz constant upper bound $λ^{⊙}$ , and we can go:

${sup}_{x \in X} ({sup}_{f \in C_{l i p}} \frac{| K (x) (f) |}{max (L i (f), | | f | |, 1)})$

$\leq {sup}_{x \in X} ({sup}_{f \in C_{l i p}} \frac{λ^{⊙} | | f | |}{max (L i (f), | | f | |, 1)})$

$\leq {sup}_{x \in X} ({sup}_{f \in C_{l i p}} λ^{⊙}) = λ^{⊙} < \infty$

And we have established boundedness. For continuity, it will take a more detailed argument. Let $x_{n}$ limit to $x$ . Our task is to show that $K (x_{n})$ limits to $K (x)$ in Hausdorff-distance/infra-KR distance.

Now, the set ${x_{n}}_{n \in N \cup {\infty}}$ is compact, so by the conditions for an infrakernel, there's a bound on the $λ$ value of the minimal points of the $K (x_{n})$ infradistributions, and the $K (x_{n})$ sets have shared compact-almost-supports. We're also assuming the compact b-uniformity condition, so the $K (x_{n})$ sets have the b-uniformity condition on them. By our necessary and sufficient conditions to be precompact, we have that the set of points ${K (x_{n})}_{n \in N \cup {\infty}}$ is precompact in $□ Y$ , so we can isolate a convergent subsequence of the sequence of infradistributions K $(x_{n})$ .

What we'll do is show that all convergent subsequences of $K (x_{n})$ have the same limit point, namely $K (x)$ . Here's the argument for it. Index the convergent subsequence with $m$ and have $K^{\infty}$ be the limiting infradistribution of it. From the pointwise convergence condition on an infrakernel,

$K (x) (f) = {lim}_{n \to \infty} K (x_{n}) (f) = {lim}_{m \to \infty} K (x_{m}) (f) = K^{\infty} (f)$

Thus, our arbitrarily generated limit point $K^{\infty}$ must perfectly match up with $K (x)$ and be identical to it, so since there's only one limit point, the sequence $K (x_{n})$ actually converges to $K (x)$ . So, because convergent sequences in the input induce convergent sequences in the output, the function $K : X \to □ Y$ must be continuous, and we've shown the other direction, that infrakernels fulfilling the compact b-uniformity condition must be bounded and continuous.

Proposition 49: The space $□ X$ equipped with the IKR-topology is a Polish space if $X$ is too.

So, we've got a metric, the infra-KR distance ie the Hausdorff-distance. We know that the notion of convergence remains independent of which metric on $X$ these metrics are being induced by, so the topology is unchanged, and they're all metrizations. We also know from Proposition 43 that $□ X$ is complete under any infra-KR distance ie Hausdorff-distance. So, $□ X$ is completely metrizable by any infra-KR distance induced by a complete metric on $X$ , all that remains is to show that it's a separable space, we need to find a countable dense subset.

We know that a single point is compact, and we also know that compactness implies that for all $ϵ$ , there is a $b^{*}$ value where $d_{h a u} (H, H^{b^{*}}) \leq ϵ$ by Proposition 46, b-uniformity.

So, the space of infradistributions with the $b$ values of their minimal points having an upper bound is dense in the space of infradistributions. All we need to do now is to have a dense countable subset of the space of infradistributions with upper bounds on the $b$ values of their minimal points.

The way we'll be doing this is taking a dense countable subset of the space of a-measures. There are only countably many ways to choose finitely many points from this subset, and a choice of finitely many a-measures can be taken to define an infradistribution by convex hull, closure and upper-completion. Assuming the fiddly details work out, we'll have countably many infradistributions which are (hopefully) dense in the space of infradistributions with bounded $b$ values, which are dense in the space of infradistributions, so we've got our dense countable subset.

There's still much to do. We need to show that there's a dense countable subset of the space of a-measures at all, we need to deal with technicalities involving normalization to show that we always get an infradistribution by this process, and we need to show that given any infradistribution with bounded $b$ values, we can make an infradistribution of this form arbitrarily close to it in Hausdorff-distance.

For our first part, the cone of a-measures can be viewed as $M^{+} (X) \times R^{\geq 0}$ , and you can also view the space of measures over $X$ as $Δ X \times R^{\geq 0}$ (a pair of a distribution and a scaling term). This isn't quite right, because there's an identification of everything when the amount of measure is 0, but this doesn't really matter. So, we're trying to find a dense countable subset of

$Δ X \times R^{\geq 0} \times R^{\geq 0}$

Now, when $X$ is a Polish space, $Δ X$ equipped with the KR-metric is also a Polish space, so there's a dense countable subset. You can take the product of this with the dense countable subset of $R^{\geq 0}$ consisting of the rational numbers 0 and above, to get a dense countable subset of the cone of a-measures.

For the normalization technicalities, we need to make sure that whenever we pick finitely many points from our dense countable subset, there's a point with $b = 0$ , a point with $λ + b = 1$ , and all points have $λ + b \geq 1$ , in order to have normalization work and your batch of points define an actual infradistribution. This can always be done, throwing out the "bad" choices of finitely many points still leaves you with a countable set.

Finally, given an infradistribution $H^{b^{*}}$ with its minimal points all having $b \leq b^{*}$ , can we make infradistributions arbitrarily close to it in Hausdorff-distance of this form where we just specify finitely many points?

Well, yes. Because of the compact-projection property of infradistribution sets, and truncating at $b^{*}$ , we can see that all the minimal points of $H^{b^{*}}$ lie in a compact set. So, given any $ϵ$ , we can cover $H^{b^{*}}$ (well, the part with $b$ values below $b^{*}$ ) with $ϵ$ -sized balls centered at its points, get a finite subcover from compactness, and then use denseness of our countable subset to take our finitely many balls with center points and pick something from the dense countable subset in each ball in order to get: A collection of finitely many points from the dense countable subset where each point in $H^{b^{*}}$ with $b$ value below $b^{*}$ is $ϵ$ distance away from one of these finitely many points, so the closed convex hull of these finitely many points is only $ϵ$ distance away from $H^{b^{*}}$ (but truncated at $b^{*}$ ), and then when upper completion is done, they stay only $ϵ$ distance away from each other.

So, given any infradistribution with bounded $b$ values, and any $ϵ$ , we can find finitely many points from a dense countable subset of the space of a-measures, which induces an infradistribution that's only $ϵ$ distance away. Because there are only countably many ways to pick finitely many points from a countable set, we have a countable set of infradistributions where any infradistribution has infradistributions of this restricted form arbitrarily close to them, and so we have separability, the last condition we need for Polishness.

Proposition 50: The infra-Vietoris topology and the topology induced by the IKR metrics are the same.

This proof will proceed in three ways. The first is showing that this collection of sets declared-to-be-open in the infra-Vietoris topology is closed under finite intersections, and thus forms a basis (instead of a sub-basis). The second way is taking any basis open set from the basis induced by the IKR metric (we'll be using the strongly equivalent Hausdorff-metric for this) and any point/infradistribution in it, and finding a Infra-Vietoris open set from the basis which is a subset of the open ball in the IKR metric. Finally, we taken any basis open set from the infra-Vietoris topology, and any point/infradistribution in such a set, and find a tiny open ball w.r.t. the IKR metric that fits within the infra-Vietoris open set.

So, here's what we're going to do: First, we're going to show that the intersection of two open sets from the basis is also in the basis. Obviously, we only need to show this for pairwise intersection, as that shows closure under finite intersection.

Consider the open set $O_{1}$ of infradistributions to be generated by the finite family of bounded-b open sets $O_{i}$ , and the open set $O_{2}$ of infradistributions to be generated by the finite family of bounded-b open sets $O_{j}$ .

Claim: The finite family of bounded-b open sets $O_{i} \cap (⋃_{j} O_{j}^{u c})$ and $O_{j} \cap (⋃_{i} O_{i}^{u c})$ , for all the $i$ and $j$ , is a suitable batch of open sets in the a-measures to perfectly replicate the $O_{1} \cap O_{2}$ open set of infradistributions.

The properties for $H \in O_{1} \cap O_{2}$ are:

$H \subseteq ⋃_{i} O_{i}^{u c}$

$H \subseteq ⋃_{j} O_{j}^{u c}$

$\forall i : H \cap O_{i} \neq \emptyset$

$\forall j : H \cap O_{j} \neq \emptyset$

$¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H) \subseteq p r_{m} (⋃_{i} O_{i})$

$¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H) \subseteq p r_{m} (⋃_{j} O_{j})$

And, the properties for $H$ lying in the set induced by the $O_{i} \cap (⋃_{j} O_{j}^{u c})$ and $O_{j} \cap (⋃_{i} O_{i}^{u c})$ families of open sets (upper completion of open set is open) is:

$H \subseteq ⋃_{i} {(O_{i} \cap ⋃_{j} O_{j}^{u c})}_{i}^{u c} \cup ⋃_{j} {(O_{j} \cap ⋃_{i} O_{i}^{u c})}_{j}^{u c}$

$\forall i : H \cap O_{i} \cap ⋃_{j} O_{j}^{u c} \neq \emptyset$

$\forall j : H \cap O_{j} \cap ⋃_{i} O_{i}^{u c} \neq \emptyset$

$¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H) \subseteq p r_{m} (⋃_{i} (O_{i} \cap ⋃_{j} O_{j}^{u c}) \cup ⋃_{j} (O_{j} \cap ⋃_{i} O_{i}^{u c}))$

So let's get started on showing equivalence of these things. We'll use $M$ for an a-measure, and use a subscript of $h, i,$ or $j$ to denote which sort of set it's in.

First, let's take the conjunction of these two things.

$H \subseteq ⋃_{i} O_{i}^{u c}$

$H \subseteq ⋃_{j} O_{j}^{u c}$

This is equivalent to "for any $M_{h}$ , it is above some $M_{i}$ and also above some $M_{j}$ ". Since the only notion of upper completion is adding the $+ b$ term, this is equivalent to "for any $M_{h}$ , we are in one of two cases, either there is a $M_{i}, M_{j}$ such that $M_{h} \geq M_{i} \geq M_{j}$ , or that $M_{h} \geq M_{j} \geq M_{i}$ "

Similarly,

$H \subseteq ⋃_{i} {(O_{i} \cap ⋃_{j} O_{j}^{u c})}_{i}^{u c} \cup ⋃_{j} {(O_{j} \cap ⋃_{i} O_{i}^{u c})}_{j}^{u c}$

Is saying "for any $M_{h}$ , we are one of two cases, either there is a $M_{i}, M_{j}$ such that $M_{h} \geq M_{i} \geq M_{j}$ , or vice-versa". And so these two conditions are equivalent.

Also, if you have both of:

$\forall i : H \cap O_{i} \cap ⋃_{j} O_{j}^{u c} \neq \emptyset$

$\forall j : H \cap O_{j} \cap ⋃_{i} O_{i}^{u c} \neq \emptyset$

Then you trivially have

$\forall i : H \cap O_{i} \neq \emptyset$

$\forall j : H \cap O_{j} \neq \emptyset$

And in the reverse direction, if you have:

$H \subseteq ⋃_{i} O_{i}^{u c}$

$H \subseteq ⋃_{j} O_{j}^{u c}$

$\forall i : H \cap O_{i} \neq \emptyset$

$\forall j : H \cap O_{j} \neq \emptyset$

then you'll get both of

$\forall i : H \cap O_{i} \cap ⋃_{j} O_{j}^{u c} \neq \emptyset$

$\forall j : H \cap O_{j} \cap ⋃_{i} O_{i}^{u c} \neq \emptyset$

That just leaves the projection property.

If you have

$¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H) \subseteq p r_{m} (⋃_{i} O_{i})$

$¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H) \subseteq p r_{m} (⋃_{j} O_{j})$

then, let's say $(m, 0)$ is your arbitrary point in the closure of the projection of $H$ . This is saying $(m, 0) \leq M_{i}$ and $(m, 0) \leq M_{j}$ . Again, by how upper completion is just adding $b$ , this is equivalent to "given an arbitary point $(m, 0)$ in the closure of the projection of $H$ , either $(m, 0) \leq M_{i} \leq M_{j}$ or $(m, 0) \leq M_{j} \leq M_{i}$ "

And the condition

$¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H) \subseteq p r_{m} (⋃_{i} (O_{i} \cap ⋃_{j} O_{j}^{u c}) \cup ⋃_{j} (O_{j} \cap ⋃_{i} O_{i}^{u c}))$

is saying "given an arbitary point $(m, 0)$ in the closure of the projection of $H$ , either $(m, 0) \leq M_{j} \leq M_{i}$ or $(m, 0) \leq M_{i} \leq M_{j}$ ", which is clearly equivalent. So, we've got our bidirectional equivalence, and any binary (and thus finite) intersection of infra-Vietoris basis opens is an infra-Vietoris basis open, so it is indeed a basis.

Now we must show that this infra-Vietoris topology is identical to the topology induced by the infra-KR metric/Hausdorff distance metric. This can be done by taking any ball in any Hausdorff distance metric centered at a point/infradistribution, and finding an infra-Vietoris open set that contains the same point and is a subset of the open ball. Conversely, we should be able to take any infra-Vietoris open set, and any infradistribution/point within it, and find an IKR open ball centered on that point that lies entirely within the infra-Vietoris open set, showing that the two topologies are equal.

So, first, assuming we've got a Hausdorff-distance of $ϵ$ centered at some infradistribution, can we make an infra-Vietoris set that lies entirely within the ball and engulfs the center point?

Well, there's some $b$ cutoff where the projection of the infradistribution $H$ (cut off at that $b$ value) is within Hausdorff-distance $\frac{ϵ}{4}$ of the projection of the entire infradistribution $H$ to the measure component. Accordingly, we have one of our open sets be a $\frac{ϵ}{2}$ -thickening of the chopped-off infradistribution, as this upper completion engulfs the entire infradistribution $H$ within it with $\frac{ϵ}{2}$ room to spare. The rest is done by covering the compact chopped-off portion with all possible open balls of size $\frac{ϵ}{4}$ , and using compactness of the chopped-off infradistribution to pick a finite subcover/finitely many open balls. $H$ has the projection of its closure lie entirely within the $\frac{ϵ}{2}$ -thickening of the lower part, has nonempty intersection with each open ball, and is a subset of the upper completion of the union of everything.

Now, our task is to show that any infradistribution that lies in the open set induced by that finite collection of open sets lies in the Hausdorff-distance-of- $ϵ$ ball centered at $H$ . So, let our arbitrary new infradistribution $H^{'}$ be a subset of the union of the upper completion of all these open sets, and have nonempty intersection with each of them, and have its closure lie in the projection of the union of them. Note that the initial open set which engulfs the entire bottom portion has all the other open sets as subsets of it, so the union and upper closure of the opens is just the upper completion of the "big" open set. Any point in $H^{'}$ must lie in this "big" open set, and it's made by thickening up $H$ (but chopped off) and taking the upper completion, witnessing that the distance from $H^{'}$ to $H$ is $\frac{ϵ}{2}$ or less.

For the other direction, pick an arbitrary point in $H$ . It's within $\frac{ϵ}{4}$ of the upper completion of one of the finitely many balls used for the open cover. And because those center points are center points of $\frac{ϵ}{4}$ -open balls which $H^{'}$ must have nonempty intersection with, and $H^{'}$ is upper-complete, it has a point of distance $\frac{ϵ}{2}$ or less from your starting point in $H$ . Thus, anything in this basis open set has Hausdorff-distance of $\frac{ϵ}{2}$ or less from the center point $H$ , so all open balls in the infra-KR metric topology can have their central point engulfed by an infra-Vietoris open set, which is a subset of the initial open ball. So we're half-done, we know that all open sets of the infra-KR metric are open in the infra-Vietoris topology.

In the other direction, we need to take an infra-Vietoris open set, and an infradistribution in it, and find some tiny hausdorff-distance from that arbitrary infradistribution that remains in the infra-Vietoris open set. Let $H$ be your infradistribution, which is a subset of the union of the upper completion of the $O_{i}$ , and has nonempty intersection with each $O_{i}$ , and the closure of its projection is a subset of the projections of the $O_{i}$ .

Now, there's one important fact we have to establish first. It is impossible to have a sequence $M_{n} \in H$ that gets arbitrarily close to the complement of $⋃_{i} O_{i}^{u c}$ . Fix some sequence $M_{n} \in H$ and assume that this sequence gets arbitrarily close to the complement of $⋃_{i} O_{i}^{u c}$ .

One of two things can happen. The first thing that can happen is that $M_{n}$ has a subsequence with bounded $b$ value. Then, we can pick out a convergent subsequence of that subsequence because infradistributions, when cut off at any finite $b$ value, are compact, and take the limit point $M_{\infty}$ , which lies in the set $H$ due to closure, and yet has distance 0 from the exterior of $⋃_{i} O_{i}^{u c}$ , which, being the complement of an open set, is closed. So, we've got a point in $H$ and also not in $⋃_{i} O_{i}^{u c}$ , which is impossible.

Therefore, the $M_{n}$ sequence must have unboundedly large $b$ value as $n$ increases. This projects down to make an $m_{n}$ sequence of measures, in the compact set $¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H)$ , and we can isolate a convergent subsequence of those, with limit point $m_{\infty}$ . Now, something interesting happens.

For the $M_{n}$ sequence of a-measures, eventually the $b$ value of them goes well past the largest $b$ value of one of the finitely many bounded- $b$ open sets. And when we go a tiny distance away to a point $M_{n}^{*}$ which lies outside of $⋃_{i} O_{i}^{u c}$ , then... well, it's got nothing from any of the $O_{i}$ below it. And nothing from any of the $O_{i}$ above it either, because the $b$ value of $M_{n}^{*}$ is too high since it's near $M_{n}$ . So, $M_{n}^{*}$ projects down to some measure $m_{n}^{*}$ which lies outside of $p r_{m} (⋃_{i} O_{i})$ . And since $M_{n}^{*}$ is really close to $M_{n}$ , $m_{n}$ (in the closure of the projection of $H$ ) is really close to the exterior of the projection of $⋃_{i} O_{i}$ , as witnessed by $m_{n}^{*}$ . And this is a closed set, because the complement of an open set is closed, and the projection of an open set is open.

This lets us show that our $m_{\infty}$ limit point lies in the complement of $p r_{m} (⋃_{i} O_{i})$ , which is impossible, because $m_{\infty}$ lies in $¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ p r_{m} (H)$ which is a subset of $p r_{m} (⋃_{i} O_{i})$ .

So our original assumption was wrong, and there is some $δ$ where any point in $H$ is always $δ$ or more distance away from the exterior of $⋃_{i} O_{i}^{u c}$ . Further, for the $O_{i}$ , since $H \cap O_{i} \neq \emptyset$ , you should be able to pick said point, and find some $ϵ_{i}$ -sized open ball you can stick around it that lies entirely within the open set $O_{i}$ .

Now, a Hausdorff-distance of $\frac{min (δ, {min}_{i} ϵ_{i})}{2}$ around $H$ , all infradistributions that close must lie in the same Vietoris-open set. Here's why. Since all points in your new infradistribution $H^{'}$ are at most $\frac{min (δ, {min}_{i} ϵ_{i})}{2}$ distance away from a point in $H$ , and given any point in $H$ you need to travel $δ$ distance to get to the exterior of $⋃_{i} O_{i}^{u c}$ , $H^{'}$ is a subset of that open set as well.

Also, $\forall i : H^{'} \cap O_{i} \neq \emptyset$ , because given the central points in the $H \cap O_{i}$ sets, you can travel $\frac{min (δ, {min}_{i} ϵ_{i})}{2}$ distance to get to a point in $H^{'}$ , which lies in the $ϵ_{i}$ -sized open ball around said point which still lies entirely within the $O_{i}$ . So any $H^{'}$ that close must intersect the same collection of open sets.

Finally, the projection thing is taken care of by the same sort of Hausdorff-distance argument we used to argue that said $H^{'}$ must lie in the union of upper-completions of the open sets. So, given any Vietoris-open, and any point in it, we can find an IKR-open which engulfs said point and lies entirely in the Vietoris open.

Therefore, the two topologies are the same.

Theorem 3: Given a crisp infrakernel $K : X i k \to X$ then if there's a $c < 1$ where, regardless of $x$ and $x^{'}$ , $d_{h a u} (K (x), K (x^{'})) \leq c \cdot d (x, x^{'})$ , there's a unique crisp infradistribution $S$ where $K_{*} (S) = S$ , and can be found by starting with any crisp infradistribution on $X$ , and repeatedly applying $K$ .

This theorem is going to require showing a lot of fiddly measure-theoretic stuff, and then once everything's all set up, the actual theorem itself falls pretty fast. So, we're going to start off by noting or proving three things needed to make progress.

The first is: "given any measurable function $s : X \to Δ X$ , and any probability distribution $μ \in Δ X$ , and any $ϵ$ , we have that there is a continuous function $g : X \to Δ X$ s.t. $\int_{X} | | g - s | | d μ < ϵ$ ".

This is an analogue of the celebrated result that continuous functions are dense in the measurable functions. I'm gonna skip small bits of it, to be reconstructed if you ask. Pretty much, we use the fact that $Δ X$ is separable in the KR-metric, and the results from this pdf to get a few results. $Δ X$ is separable according to the KR-metric, and $s$ has all its output values in that separable space, so it's separably valued. Also s is measurable, so by proposition 1.8 in the paper, $s$ is strongly measurable, so it's the pointwise limit of a sequence of simple functions. The actual proof of this fact goes through Theorem 1.5 in that paper, and looking carefully at Theorem 1.5, the simple functions constructed have their value in $Δ X$ , because the proof is like "fix a dense countable subset of your space of interest, then modify $s$ to be a simple function by rounding off the outputs to the nearest of a finite collection of points from that subset". Proposition 1.10 from that paper says that since $s$ is strongly measurable, then given any $μ \in Δ X$ , $s$ is strongly $μ$ -measurable, and then proposition 1.16 from that paper says that $s$ is Bochner-integrable because $\int_{X} | | s (x) | | d μ = 1 < \infty$ , because all the $s (x)$ are probability distributions. Further, part of the proof of proposition 1.16 from that paper shows that the $Δ X$ -valued simple functions, due to their bounded norm, converge to $s$ in the $L^{1}$ norm by the dominated convergence theorem.

Anyways, at this point, we've shown that any measurable function $s : X \to Δ X$ has simple $Δ X$ -valued functions that limit to it, so the simple functions are dense in the measurable functions $X \to Δ X$ . Thus, at this point, we just need to show that we can make continuous functions $X \to Δ X$ that are arbitrarily close to some simple function $χ : X \to Δ X$ .

Let's begin. Our simple functions $χ$ , from looking carefully at the proof of Theorem 1.5 from the linked paper, take the form of partitioning $X$ into finitely many disjoint and covering measurable sets $A_{i}$ , and mapping each of them to some probability distribution $ν_{i} \in Δ X$ . Let $n$ be how many disjoint sets there are.

Thus, our task is to take a simple function $χ : X \to Δ X$ , and find a continuous function $g : X \to Δ X$ where $\int_{X} | | χ (x) - g (x) | | d μ < ϵ$ If we can do this for any $ϵ$ , then it will show that continuous functions are dense in the simple functions which are dense in the measurable functions in $L^{1}$ .

So, here's what you do. Every probability distribution over a Polish space has the property that the measure of any measurable set $A$ can be approximated from below by the measure of a compact subset $C$ . We have finitely many disjoint measurable sets $A_{i}$ , and we associate each of them with a compact subset $C_{i}$ where $μ (A_{i} / C_{i}) < \frac{ϵ}{n}$ .

Let's abuse notation a bit and have $d (x, C_{i})$ be the minimum distance from $x$ to a point in $C_{i}$ . Then, let $g$ be defined as:

$g (x) := \sum_{i \leq n} ⎛ ⎝ \frac{\frac{1}{d (x, C_{i})}}{\sum_{j} \frac{1}{d (x, C_{j})}} ν_{i} ⎞ ⎠$

Admittedly, the distance may be 0, leading to $\frac{\infty}{\infty}$ , however, this will be considered to be 1. The reason this works and is continuous is because all the $C_{i}$ are bounded away from each other. This is because they're all subsets of $A_{i}$ which are disjoint, so all the $C_{i}$ are disjoint, and any two compact sets which are disjoint must be bounded away from each other, because otherwise you could fix a sequence in one compact set getting closer and closer to the other compact set, isolate a convergent subsequence, and it'd limit to be distance 0 from the other compact set, and so in it, contradicting disjointness. Now, we can go:

$\int_{X} | | g (x) - χ (x) | | d μ$

$= \sum_{i \leq n} (\int_{A_{i}} | | g (x) - χ (x) | | d μ)$

$= \sum_{i \leq n} (\int_{C_{i}} | | g (x) - χ (x) | | d μ + \int_{A_{i} / C_{i}} | | g (x) - χ (x) | | d μ)$

And then, for the next step, we observe that when $x \in C_{i}$ , $χ$ is like "well it's in $A_{i}$ , I'll return $ν_{i}$ as my output", and $g$ is like "well, it's distance 0 from $A_{i}$ , so I turn into $\frac{\infty}{\infty} ν_{i} = ν_{i}$ by convention". Further, $g$ always returns probability distributions, and the maximum KR-distance two probability distributions can have (when the original space $X$ has its metric bounded above by 1, which can always be imposed without altering the topology) is 1. So, we get

$\leq \sum_{i \leq n} (\int_{C_{i}} | | ν_{i} - ν_{i} | | d μ + \int_{A_{i} / C_{i}} 1 d μ)$

$= \sum_{i \leq n} (\int_{A_{i} / C_{i}} 1 d μ) = \sum_{i \leq n} (μ (A_{i} / C_{i}))$

$< \sum_{i \leq n} \frac{ϵ}{n} = ϵ$

And we're done. We showed that any measurable function $s : X \to Δ X$ could be arbitrarily closely approximated in $L^{1}$ by a simple function $χ : X \to Δ X$ , and that any simple function could be arbitrarily closely approximated in $L^{1}$ by a continuous function $g : X \to Δ X$ .

The next fact we'll need is that, when $X$ has its distance metric bounded above by 1, which can always be done without altering the topology, then the KR-distance on probability distributions is exactly equal to the Earthmover distance, the first Wasserstein metric.

The third and final fact we need is that, if you have an function $K : X \to K (Δ X)$ (function to the space of compact subsets of $Δ X$ ), which is continuous w.r.t. the Hausdorff metric, and always produces compact convex sets, then the function $ψ : X \times X \to K (Δ X)$ (defined relative to some continuous function $g : X \to Δ X$ and some $ϵ$ ), defined by:

$ψ (x, x^{'}) := ¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ {ν \in K (x^{'}) | d (ν, g (x)) < {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) + ϵ}$

Is lower-hemicontinuous.

The requirement for being lower-hemicontinuous is if you have some sequence $(x_{n}, x_{n}^{'})$ limiting to $(x, x^{'})$ , and some $ν \in ψ (x, x^{'})$ , then you should be able to find some sequence of points $ν_{n} \in ψ (x_{n}, x_{n}^{'})$ where $ν_{n}$ limits to $ν$ .

First, we'll need that, if $x_{n}^{'}$ limits to $x^{'}$ , then $⋃_{n \in N \cup {\infty}} K (x_{n}^{'})$ is a compact set. We know that $K$ is continuous and ${x_{n}^{'}}_{n \in N \cup {\infty}}$ is compact, so Lemma 3 steps in to show that said set is compact.

The relevant implication of this, when paired with Hausdorff-continuity of $K$ , is that you can take any sequence of distributions $ν_{n} \in K (x_{n}^{'})$ , and find a convergent subsequence that converges to something in $K (x^{'})$ .

The second piece for this argument we'll need is that:

${lim}_{n \to \infty} ({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n}))) = {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x))$

So, first, we observe that since all the $K (x_{n}^{'})$ are compact, we can find actual minimizers $ν_{n}^{min}$ . We don't know that the limit actually exists, so pass to a subsequence indexed by $m$ where we're taking the liminf. Pass to another subsequence where the $ν_{m}^{min}$ actually converge to a point in $K (x^{'})$ , which we can do by the immediately preceding point about compactness. Then,

${liminf}_{n \to \infty} ({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n}))) = {lim}_{m \to \infty} ({inf}_{ν^{'} \in K (x_{m}^{'})} d (ν^{'}, g (x_{m})))$

$= {lim}_{m \to \infty} d (ν_{m}^{min}, g (x_{m})) = d (ν^{min}, g (x)) \geq {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x))$

This was done by $g$ being continuous, so $g (x_{m})$ converges to $g (x)$ , and $ν^{min} \in K (x^{'})$ .

Second, fix some minimizer $ν^{min} \in K (x^{'})$ , and by Hausdorff-continuity of $K$ , we can find a sequence $ν_{n} \in K (x_{n}^{'})$ which limits to it. Then,

${limsup}_{n \to \infty} ({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n}))) \leq {limsup}_{n \to \infty} d (ν_{n}, g (x_{n}))$

$= {lim}_{n \to \infty} d (ν_{n}, g (x_{n})) = d (ν^{min}, g (x)) = {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x))$

And at this point, we can go:

${liminf}_{n \to \infty} ({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n}))) \geq {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) \geq {limsup}_{n \to \infty} ({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n})))$

So, we have

${lim}_{n \to \infty} ({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n}))) = {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x))$

Alright, that's part two of the argument for part 3 of the preliminaries for this theorem down. Time for part 3 of the argument. We'll assume that $x_{n}$ limits to $x$ , $x_{n}^{'}$ limits to $x^{'}$ , and pick an arbitrary $ν$ s.t. $ν \in K (x^{'})$ , and

$d (ν, g (x)) < {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) + ϵ$

Let $ν_{n} \in K (x_{n}^{'})$ be some point in $K (x_{n}^{'})$ that's as close as possible to $ν$ . Obviously, these limit to $ν$ , due to Hausdorff-continuity of $K$ . Non-obviously, a tail of them (all but finitely many), will lie in $ψ (x_{n}, x_{n}^{'})$ . Here's why.

Let $δ$ be:

${inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) + ϵ - d (ν, g (x))$

There is some finite $n$ by which you are guaranteed three things forever afterwards.

First, $d_{h a u} (K (x_{n}^{'}), K (x^{'})) < \frac{δ}{3}$ , because $K$ is Hausdorff-continuous and $x_{n}^{'}$ limits to $x^{'}$ .

Second, $d (g (x_{n}), g (x)) < \frac{δ}{3}$ , because $g$ is continuous and $x_{n}$ limits to $x$ .

Third,

${inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) < {inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n})) + \frac{δ}{3}$

Because we know that this minimum-distance quantity limits to the minimum-distance quantity of the limit, we showed it above. Now, at this point, we can go:

$d (ν_{n}, g (x_{n})) \leq d (ν_{n}, ν) + d (ν, g (x)) + d (g (x), g (x_{n}))$

$\leq d_{h a u} (K (x_{n}^{'}), K (x^{'})) + d (ν, g (x)) + d (g (x), g (x_{n}))$

$< d (ν, g (x)) + \frac{2 δ}{3}$

$= ({inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) + ϵ - δ) + \frac{2 δ}{3}$

$< (({inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n})) + \frac{δ}{3}) + ϵ - δ) + \frac{2 δ}{3}$

$= {inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n})) + ϵ$

This was splitting the distance up, observing that $ν_{n}$ was selected to be as close as possible to $ν$ , so it'd be as close or closer than the Hausdorff distance between the sets they were selected from, using that two of the distances were small, using how $δ$ was defined, and using our bound on the minimum-distance quantity. Anyways, this shows that forever after some timestep,

$d (ν_{n}, g (x_{n})) < {inf}_{ν^{'} \in K (x_{n}^{'})} d (ν^{'}, g (x_{n})) + ϵ$

And therefore, since

We have that

$ν_{n} \in ψ (x_{n}, x_{n}^{'})$

Time for our final part! Showing lower-hemicontinuity of $ψ$ requires showing that, for all sequences $(x_{n}, x_{n}^{'})$ limiting to $(x, x^{'})$ , and all $ν \in ψ (x, x^{'})$ , we can find a sequence $ν_{n} \in ψ (x_{n}, x_{n}^{'})$ limiting to $ν$ . We just showed that for $ν$ fulfilling the defining inequality. But we took the closure, didn't we? What if our $ν$ was added in the closure? Then we can take a sequence $ν_{m} \in ψ (x, x^{'})$ (but without the closure part) that does fulfill the defining inequality, and do a thing where we make a sequence $ν_{n, m} \in ψ (x_{n}, x_{n}^{'})$ which limits to $ν_{m}$ , one sequence for each $m$ . Then, we basically proceed for a while in $ν_{n, 0}$ until we have a guarantee that $ν_{n, 1}$ will remain close to $ν_{1}$ , and switch to picking from the sequence $ν_{n, 1}$ , and repeat, slowly ratcheting up the second index over time. Said sequence will get arbitrarily close to the $ν_{m}$ values, which get arbitrarily close to $ν$ itself, so this "step up the sequence index when you've waited long enough" sequence can limit to $ν$

And with this argument, we know that $ψ$ is lower-hemicontinuous.

Alright, now we can start embarking on the true proof instead of just setting everything up. Remember, we assumed that there was a crisp infrakernel $K : X \to Δ X$ , all the $K (x)$ have compact sets of minimal points (by CAS), and that there was some fixed $c < 1$ where $d_{h a u} (K (x)^{min}, K (x^{'})^{min}) \leq c d (x, x^{'})$ regardless of $x$ and $x^{'}$ , and our goal is to show that there's a unique stationary crisp infradistribution which can be found by selecting any crisp infradistribution and repeatedly iterating $K$ .

There's a bit of setup to do first, but thankfully it isn't as demanding as we've done before. First, we'll abuse notation a bit by using $H$ and $H^{'}$ and $K (x)$ and $K (x^{'})$ to refer, not to the infradistribution as a whole, but to their compact convex sets of minimal points, ie, sets of probability distributions.

Second, we recall from the proof of proposition what the form for $K_{*} (H)$ (the pushforward of $H$ ) is. Well, actually, the set of minimal points. It is:

$¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{μ \in H} E_{μ} [K (x)])$

Ie, the closed convex hull of the union of mixtures of the sets of minimal points of $K (x)$ . Mixing sets with regards to a probability distribution is done by fixing a measurable selection function $s : X \to Δ X$ s.t. for all $x$ , $s (x) \in K (x)$ . $E_{μ} [s (x)]$ is a probability distribution, and then you just union all those points together for all the possible selection functions to make your set $E_{μ} [K (x)]$ .

Our final thing to recall (well, actually, look up from Wikipedia), is that since the Earthmover distance/1st Wasserstein metric/KR-distance is all the same, we'll be using an alternate formulation of distance. If $μ, μ^{'}$ are probability distributions over $X$ , then

$d (μ, μ^{'}) = {inf}_{ξ^{'} \in Γ (μ, μ^{'})} \int_{X \times X} d (x, x^{'}) d ξ^{'}$

Ie, of all probability distributions over $X \times X$ , restrict to those which make $μ$ and $μ^{'}$ when you project it down to the appropriate coordinates, and try to pick one that makes the integral of the distance function as low as possible. The lowest integral of distance you can get is the earthmover distance. You can check Wikipedia for this formulation. We'll be using this form once.

Let's begin. First, by the form of the set of minimal points of the pushforward,

$d_{h a u} (K_{*} (H), K_{*} (H^{'}))$

$= d_{h a u} (¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{μ \in H} E_{μ} [K (x)]), ¯ ¯¯¯¯¯¯ ¯ c . h (⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)]))$

And then, we observe that taking the closure doesn't affect the Hausdorff distance at all, so we can go:

$= d_{h a u} (c . h (⋃_{μ \in H} E_{μ} [K (x)]), c . h (⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)]))$

$\leq d_{h a u} (⋃_{μ \in H} E_{μ} [K (x)], ⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)])$

The reason for that inequality is that convex hull doesn't increase the Hausdorff distance. If you've got a finite mix of points from $⋃_{μ \in H} E_{μ} [K (x)]$ in the convex hull, then you could just take each of those, find the closest point from the analogue for $H^{'}$ , and mix those together to make a point close to your thing added in the convex hull, so distance doesn't go up when you do convex hull.

At this point, we have the inequality:

$d_{h a u} (K_{*} (H), K_{*} (H^{'})) \leq d_{h a u} (⋃_{μ \in H} E_{μ} [K (x)], ⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)])$

Our mission is to impose a suitable bound on the second term, and once we've done that, show that $K_{*}$ has type $K (Δ X) \to K (Δ X)$ (maps compact sets of probability distributions to cmpact sets of probability distributions), and then we can apply the Banach fixpoint theorem on $K (Δ X)$ to get a fix-set of probability distributions that doesn't alter when you apply $K_{*}$ (pushforward), and can be found by iteration.

For our first part, imposing a bound on

$d_{h a u} (⋃_{μ \in H} E_{μ} [K (x)], ⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)])$

We shall do the following. We will pick a point in the first set, and find a nearby point in the second set. A point from the first set must be associated with some $μ \in Δ X$ , and some measurable selection function $s : X \to Δ X$ where, for all $x$ , $s (x) \in K (x)$ . Said point is written as $E_{μ} [s (x)]$ , or alternatively, as $\int_{X} s (x) d μ$ . Our task is to find a point in the second set.

Construct it as follows: Select $μ^{'} \in H^{'}$ to be as close to $μ$ as you can get. Pick an $ϵ$ . Then, pick some $ξ \in Δ (X \times X)$ s.t. the projection of $ξ$ to the first coordinate is $μ$ and the projection to the second coordinate is $μ^{'}$ , and:

$\int_{X \times X} d (x, x^{'}) d ξ < {inf}_{ξ^{'} \in Γ (μ, μ^{'})} \int_{X \times X} d (x, x^{'}) d ξ^{'} + ϵ$

Such a $ξ$ can always be found. Pick a continuous function $g : X \to Δ X$ such that

$\int_{X} | | s (x) - g (x) | | d μ < ϵ$

We can always pick such a function because we've already shown that continuous functions are dense in the measurable functions.

At this point, recall our function $ψ : X \times X \to K (Δ X)$ defined as:

We know that said function will be lower-hemicontinuous. Further, it always produces closed sets (we explicitly took the closure), and they're nonempty (just look at it a bit), and are convex. Convexity is a bit more involved, and it occurs because a mix of points a certain distance or less from some set point ( $g (x)$ in this case), will also be the same or less distance away from said point, and the $K (x^{'})$ are all convex. At this point, we can invoke the Michael selection theorem to get a continuous function $g^{'} : X \times X \to Δ X$ where, for all $x, x^{'}$ , $g^{'} (x, x^{'}) \in ψ (x, x^{'})$ .

At this point, remember our distribution $ξ \in Δ (X \times X)$ which projects down to $μ^{'}$ ? Well, by the disintegration theorem (the usual probability theory version), $ξ$ can be written as $μ^{'} ⋉ k^{'}$ where $k^{'} (x^{'}) \in Δ X$ gives the conditional probability distribution over $X$ if you see some $x^{'}$ .

And now we can finally define our measurable selection function $s^{'}$ , it is defined as:
$s^{'} (x^{'}) = \int_{x \in X} g^{'} (x, x^{'}) d k^{'} (x^{'})$

And, our final point that we will pick to be close to $E_{μ} [s (x)]$ will be: $E_{μ^{'}} [s^{'} (x^{'})]$ .

Uh... Is $s^{'}$ even a legit selection function? What we need is that for all $x^{'}$ , $s^{'} (x^{'}) \in K (x^{'})$ . Well, fear not.

$g^{'} (x, x^{'}) \in ψ (x, x^{'}) = ¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ {ν \in K (x^{'}) | d (ν, g (x)) < {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) + ϵ}$

And, because $K (x^{'})$ is closed and we took the closure of a batch of points within it, we have:

$g^{'} (x, x^{'}) \in ψ (x, x^{'}) \subseteq K (x^{'})$

And, because $K (x^{'})$ is closed and compact, making $s^{'} (x^{'})$ by integrating over a bunch of different choices of $x$ to plug in just makes the average of a batch of points within $K (x^{'})$ , which lies in $K (x^{'})$ , so we're good, $s^{'} (x^{'}) \in K (x^{'})$ regardless of $x^{'}$ and it's a legit selection function. Now, we can get into inequality shuffling!

$d (E_{μ} [s (x)], E_{μ^{'}} [s^{'} (x^{'})])$

$= ∣ ∣ ∣ ∣ \int_{X} s (x) d μ - \int_{X} s^{'} (x^{'}) d μ^{'} ∣ ∣ ∣ ∣$

$\leq ∣ ∣ ∣ ∣ \int_{X} s (x) d μ - \int_{X} g (x) d μ ∣ ∣ ∣ ∣ + ∣ ∣ ∣ ∣ \int_{X} g (x) d μ - \int_{X} s^{'} (x^{'}) d μ^{'} ∣ ∣ ∣ ∣$

$= ∣ ∣ ∣ ∣ \int_{X} s (x) d μ - \int_{X} g (x) d μ ∣ ∣ ∣ ∣ + ∣ ∣ ∣ ∣ \int_{X} g (x) d μ - \int_{x^{'} \in X} (\int_{x \in X} g^{'} (x, x^{'}) d k^{'} (x^{'})) d μ^{'} ∣ ∣ ∣ ∣$

And then we can notice something really interesting. We can rewrite that final integral as

$E_{μ^{'}} [λ x^{'} . E_{k^{'} (x^{'})} (λ x . g^{'} (x, x^{'}))]$

Look familiar? That's just the semidirect product!! It's the exact same as

$E_{μ^{'} ⋉ k^{'}} [g^{'} (x, x^{'})]$

And, by our construction, $μ^{'} ⋉ k^{'} = ξ$ . So, writing this as an integral, we have:

$= ∣ ∣ ∣ ∣ \int_{X} s (x) d μ - \int_{X} g (x) d μ ∣ ∣ ∣ ∣ + ∣ ∣ ∣ ∣ \int_{X} g (x) d μ - \int_{X \times X} g^{'} (x, x^{'}) d ξ ∣ ∣ ∣ ∣$

Also, because $ξ$ projects to $μ$ in the first coordinate, we can write our integral over $μ$ as an integral over $ξ$ instead, it doesn't change the value.

$= ∣ ∣ ∣ ∣ \int_{X} s (x) d μ - \int_{X} g (x) d μ ∣ ∣ ∣ ∣ + ∣ ∣ ∣ ∣ \int_{X \times X} g (x) d ξ - \int_{X \times X} g^{'} (x, x^{'}) d ξ ∣ ∣ ∣ ∣$

$= ∣ ∣ ∣ ∣ \int_{X} (s (x) - g (x)) d μ ∣ ∣ ∣ ∣ + ∣ ∣ ∣ ∣ \int_{X \times X} (g (x) - g^{'} (x, x^{'})) d ξ ∣ ∣ ∣ ∣$

And move the norm inside to get:

$\leq \int_{X} | | s (x) - g (x) | | d μ + \int_{X \times X} | | g (x) - g^{'} (x, x^{'}) | | d ξ$

We know that $s$ and $g$ are close w.r.t. $μ$ , so we can get:

$< ϵ + \int_{X \times X} | | g (x) - g^{'} (x, x^{'}) | | d ξ$

Now, we should observe something interesting. Because $g^{'} (x, x^{'}) \in ψ (x, x^{'})$ , and using how $ψ (x, x^{'})$ was defined, we have that:

$d (g^{'} (x, x^{'}), g (x)) \leq {inf}_{ν^{'} \in K (x^{'})} d (ν^{'}, g (x)) + ϵ$

Using the equivalence between distance of two points and the norm of subtracting the two points, we have:

$\leq ϵ + \int_{X \times X} ({inf}_{ν^{'} \in K (x^{'})} | | ν^{'} - g (x) | | + ϵ) d ξ$

$= \int_{X \times X} ({inf}_{ν^{'} \in K (x^{'})} | | ν^{'} - g (x) | |) d ξ + 2 ϵ$

And then we can use another distance inequality to go:

$\leq \int_{X \times X} ({inf}_{ν^{'} \in K (x^{'})} (| | ν^{'} - s (x) | | + | | s (x) - g (x) | |)) d ξ + 2 ϵ$

And now, since the latter term isn't affected by the inf, we can pull it out, and get:

$= \int_{X \times X} ({inf}_{ν^{'} \in K (x^{'})} | | ν^{'} - s (x) | |) d ξ + \int_{X \times X} | | s (x) - g (x) | | d ξ + 2 ϵ$

Again, the projection of $ξ$ to the first coordinate is $μ$ , so we have:

$= \int_{X \times X} ({inf}_{ν^{'} \in K (x^{'})} | | ν^{'} - s (x) | |) d ξ + \int_{X} | | s (x) - g (x) | | d μ + 2 ϵ$

And, because $s$ and $g$ are close w.r.t $μ$ , we have:

$< \int_{X \times X} ({inf}_{ν^{'} \in K (x^{'})} | | ν^{'} - s (x) | |) d ξ + 3 ϵ$

And then... $s (x) \in K (x)$ , and $ν^{'}$ is as close as possible within $K (x^{'})$ , so we have:

$\leq \int_{X \times X} d_{h a u} (K (x), K (x^{'})) d ξ + 3 ϵ$

And we know that, as one of our starting assumptions, $d_{h a u} (K (x), K (x^{'})) \leq c d (x, x^{'})$ where $c$ is a constant below 1. So,

$\leq \int_{X \times X} c d (x, x^{'}) d ξ + 3 ϵ$

$= c \int_{X \times X} d (x, x^{'}) d ξ + 3 ϵ$

But wait, we selected \xi such that:

$\int_{X \times X} d (x, x^{'}) d ξ < {inf}_{ξ^{'} \in Γ (μ, μ^{'})} \int_{X \times X} d (x, x^{'}) d ξ^{'} + ϵ$

So, making that substitution, we have:

$< c ({inf}_{ξ^{'} \in Γ (μ, μ^{'})} \int_{X \times X} d (x, x^{'}) d ξ^{'} + ϵ) + 3 ϵ$

$< c ({inf}_{ξ^{'} \in Γ (μ, μ^{'})} \int_{X \times X} d (x, x^{'}) d ξ^{'}) + 4 ϵ$

And this is just the definition of earthmover distance, so we have:

$= c d (μ, μ^{'}) + 4 ϵ$

And, $μ^{'}$ was selected from $H^{'}$ to be as close as possible to $μ$ , so we've got one last inequality.

$\leq c d_{h a u} (H, H^{'}) + 4 ϵ$

Our final inequality, putting all this together, is that

$d (E_{μ} [s (x)], E_{μ^{'}} [s^{'} (x^{'})]) < c d_{h a u} (H, H^{'}) + 4 ϵ$

So, given an arbitrary point in $H$ , we can craft a point in $H^{'}$ that isn't too far away, and switch $H^{'}$ and $H$ to get:

$d_{h a u} (⋃_{μ \in H} E_{μ} [K (x)], ⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)]) < c d_{h a u} (H, H^{'}) + 4 ϵ$

But wait, $ϵ$ was arbitrary! So we can shrink it to 0 to get:

$d_{h a u} (⋃_{μ \in H} E_{μ} [K (x)], ⋃_{μ^{'} \in H^{'}} E_{μ^{'}} [K (x)]) \leq c d_{h a u} (H, H^{'})$

Putting this bound together with what we were working on at the start,

$d_{h a u} (K_{*} (H), K_{*} (H^{'})) \leq c d_{h a u} (H, H^{'})$

Or, alternately,

$\frac{d_{h a u} (K_{*} (H), K_{*} (H^{'}))}{d_{h a u} (H, H^{'})} \leq c < 1$

So, $K_{*}$ is definitely a contraction mapping. But we do need to check that if $H$ is a compact set of probability distributions, then $K_{*} (H)$ is compact as well. Fortunately, this is doable because the pushforward of an infradistribution is an infradistribution, and you can chop off an infradistribution at any $+ b$ level and get a compact set of a-measures. So if we chop it off at 0, we get exactly what we want.

The Hausdorff-limit of a series of convex compact sets is convex compact, and we know that for all $H, H^{'}$ that are compact convex,

$\frac{d_{h a u} (K_{*} (H), K_{*} (H^{'}))}{d_{h a u} (H, H^{'})} \leq c < 1$

So, $K_{*}$ is a contraction mapping on this closed complete metric space (the space of compact subsets of a complete space is complete), and we can apply the Banach fixpoint theorem to find a unique fixset $S$ s.t. $S = K_{*} (S)$ . $S$ is a convex compact set of probability distributions, so it corresponds to a crisp infradistribution that can be found by taking any crisp infradistribution and iterating $K_{*}$ (the pushfoward), and we're done!

Proposition 51: If $h$ is the infradistribution corresponding to a probability distribution $μ$ , then if $α \in (1, \infty)$ , $H_{α} (h) = H_{α} (μ)$ , where the latter $H_{α}$ is the conventional definition of Renyi entropy for probability distributions, and the former $H_{α}$ is the definition of Renyi Entropy for infradistributions.

For an infradistribution, the Renyi entropy would be:

$H_{α} (h) = \frac{α}{1 - α} ln ({max}_{q} h (q_{i}^{\frac{α - 1}{α}}))$

But it's a probability distribution corresponding to $μ$ , so the $h$ turns into an expectation.

$= \frac{α}{1 - α} ln ({max}_{q} (\sum_{i} μ_{i} q_{i}^{\frac{α - 1}{α}}))$

Note that $q \in Δ X$ , it's a probability distribution. To solve the maximization, we will be using Lagrange multipliers. We maximize $\sum_{i} μ_{i} q_{i}^{\frac{α - 1}{α}}$ subject to the constraint that $\sum_{i} q_{i} - 1 = 0$ . Thus, we make the function

$\sum_{i} μ_{i} q_{i}^{\frac{α - 1}{α}} - λ (\sum_{i} q_{i} - 1)$

and set all the partial derivatives w.r.t. the $q_{i}$ and $λ$ to 0, so we get a series of equations of the form:

$\frac{α - 1}{α} μ_{n} q_{n}^{\frac{- 1}{α}} - λ = 0$

for the various $q$ 's, and for $λ$ , we have:

$\sum_{i} q_{i} - 1 = 0$

And then, if we can derive what the various $q_{n}$ and what $λ$ is, we can conclude that those maximize our function. Solving this yields:

$q_{n} = \frac{μ_{n}^{α}}{\sum_{i} μ_{i}^{α}}$

$λ = \frac{α - 1}{α} {(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}}$

Now, for the partial derivative w.r.t. $λ$ , we have:

$\sum_{i} q_{i} - 1 = \sum_{i} \frac{μ_{i}^{α}}{\sum_{i} μ_{i}^{α}} - 1 = \frac{\sum_{i} μ_{i}^{α}}{\sum_{i} μ_{i}^{α}} - 1 = 1 - 1 = 0$

and for the partial derivative w.r.t. $q_{n}$ , we have:

$\frac{α - 1}{α} μ_{n} q_{n}^{\frac{- 1}{α}} - λ = \frac{α - 1}{α} μ_{n} {(\frac{μ_{n}^{α}}{\sum_{i} μ_{i}^{α}})}^{\frac{- 1}{α}} - \frac{α - 1}{α} {(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}}$

$= \frac{α - 1}{α} ⎛ ⎝ μ_{n} {(\frac{μ_{n}^{α}}{\sum_{i} μ_{i}^{α}})}^{\frac{- 1}{α}} - {(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}} ⎞ ⎠$

$= \frac{α - 1}{α} ⎛ ⎜ ⎝ μ_{n} \frac{μ_{n}^{- 1}}{{(\sum_{i} μ_{i}^{α})}_{i}^{\frac{- 1}{α}}} - {(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}} ⎞ ⎟ ⎠$

$= \frac{α - 1}{α} ⎛ ⎜ ⎝ \frac{1}{{(\sum_{i} μ_{i}^{α})}_{i}^{\frac{- 1}{α}}} - {(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}} ⎞ ⎟ ⎠$

$= \frac{α - 1}{α} ({(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}} - {(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}}) = 0$

So, these are legit. Now, when we substitute these values into our maximization problem, we get:

$\frac{α}{1 - α} ln ({max}_{q} (\sum_{i} μ_{i} q_{i}^{\frac{α - 1}{α}}))$

$= \frac{α}{1 - α} ln ⎛ ⎝ \sum_{n} μ_{n} {(\frac{μ_{n}^{α}}{\sum_{i} μ_{i}^{α}})}^{\frac{α - 1}{α}} ⎞ ⎠$

$= \frac{α}{1 - α} ln ⎛ ⎜ ⎝ \sum_{n} μ_{n} ⎛ ⎜ ⎝ \frac{μ_{n}^{α - 1}}{{(\sum_{i} μ_{i}^{α})}_{i}^{\frac{α - 1}{α}}} ⎞ ⎟ ⎠ ⎞ ⎟ ⎠$

$= \frac{α}{1 - α} ln ⎛ ⎜ ⎝ \frac{\sum_{i} μ_{i}^{α}}{{(\sum_{i} μ_{i}^{α})}_{i}^{\frac{α - 1}{α}}} ⎞ ⎟ ⎠$

$= \frac{α}{1 - α} ln ⎛ ⎜ ⎝ \frac{1}{{(\sum_{i} μ_{i}^{α})}_{i}^{\frac{- 1}{α}}} ⎞ ⎟ ⎠$

$= \frac{α}{1 - α} ln ({(\sum_{i} μ_{i}^{α})}_{i}^{\frac{1}{α}})$

$= \frac{1}{1 - α} ln (\sum_{i} μ_{i}^{α}) = H_{α} (μ)$

Which is the standard formulation of Renyi entropy for a probability distribution.

Proposition 52: ${lim}_{α \to \infty} H_{α} (h)$ exists for all infradistributions $h$ , and equals $- ln ({sup}_{q} h (q))$

Our task will be to bound the difference between

$\frac{α}{1 - α} ln ({sup}_{q} (h (q^{\frac{α - 1}{α}})))$

and

$- ln ({sup}_{q} (h (q)))$

as $α \to \infty$ .

Well, for this endeavor, our first task is to bound the distance between the an arbitrary function $q$ and the function $q^{\frac{α - 1}{α}}$ , for sufficiently large $α$ .

Remember, the distance between these functions is ${sup}_{i} | q_{i}^{\frac{α - 1}{α}} - q_{i} |$
Now, since all our functions are bounded in $[0, 1]$ , we're raising a number below 1 to a power slightly below 1, so it actually stays the same or gets larger. So, we don't need the absolute value here.

Thus, to figure out what the largest possible value of this quantity is for a given $α$ , let's let $q_{i}$ be our variable, and $α$ be our constant. Then we differentiate, set to 0 solve the equation for $q_{i}$ and substitute that back in to find out the largest possible value that this quantity can be, and thus the largest possible difference between the functions $q^{\frac{α - 1}{α}}$ and $q$ , as a function of $α$ .

Differentiating $q_{i}^{\frac{α - 1}{α}} - q_{i}$ w.r.t. $q_{i}$ and setting it equal to 0, we must solve for $q_{i}$ in the following equation:

$\frac{α - 1}{α} q_{i}^{\frac{- 1}{α}} - 1 = 0$

The value of $q_{i}$ that solves this equation is:

$q_{i} = {(\frac{α - 1}{α})}^{α}$

as can be seen by substituting this back in. This is our $q_{i}$ which attains the maximum value (no weird stuff going on, as shown by graphing the function), and we're raising a number below 1 to a high power, so it's in $[0, 1]$ , and thus a permissible function.

Substituting this back into $q_{i}^{\frac{α - 1}{α}} - q_{i}$ , we get:

$= q_{i} (q_{i}^{\frac{- 1}{α}} - 1)$

$= {(\frac{α - 1}{α})}^{α} ({({(\frac{α - 1}{α})}^{α})}^{\frac{- 1}{α}} - 1)$

$= {(\frac{α - 1}{α})}^{α} ({(\frac{α - 1}{α})}^{- 1} - 1)$

$= {(\frac{α - 1}{α})}^{α} (\frac{α}{α - 1} - \frac{α - 1}{α - 1})$

$= {(\frac{α - 1}{α})}^{α} (\frac{1}{α - 1})$

And, this is the maximum possible difference between the function $q$ and $q^{\frac{α - 1}{α}}$ , as a function of $α$ .

Therefore,

$\forall q : d (q^{\frac{α - 1}{α}}, q) \leq {(\frac{α - 1}{α})}^{α} (\frac{1}{α - 1}) \leq \frac{1}{α - 1}$

Now, due to Lipschitzness of $h$ , in the $α \to \infty$ limit,

$| h (q) - h (q^{\frac{α - 1}{α}}) |$

limits to 0 uniformly in $q$ . Therefore, in the $α \to \infty$ limit,

${sup}_{q} (h (q^{\frac{α - 1}{α}}))$ limits to ${sup}_{q} h (q)$ .

Therefore, $ln$ of these quantities limit to each other. And $\frac{α}{1 - α}$ limits to -1.

Accordingly, in the $α \to \infty$ limit,

$\frac{α}{1 - α} ln ({sup}_{q} (h (q^{\frac{α - 1}{α}})))$

limits to

$- ln ({sup}_{q} (h (q)))$

and so, we can define the Renyi entropy for $α = \infty$ .

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

4

LBIT Proofs 7: Propositions 48-52

4