Proposition 39: Given a crisp infradistribution over , an infrakernel from to infradistributions over , and suggestively abbreviating as (hypothesis ) and as (your infraprior where you have Knightian uncertainty over how to mix the hypotheses), then
Proof: Assume that and are functions of type and respectively, ie, likeliehood and utility doesn't depend on which hypothesis you're in, just what happens. First, unpack our abbreviations and what an update means.
Then use the definition of an infrakernel pushforward.
For the next thing, we're just making types a bit more explicit, only depend on , not .
Then we pack the semidirect product back up.
And pack the update back up.
At this point, we invoke the Infra-Disintegration Theorem.
We unpack what our new modified prior is, via the Infra-Disintegration Theorem.
and unpack the semidirect product.
Now we unpack and .
And unpack what is
And reabbreviate as ,
And then pack it back up into a suggestive form as a sort of expectation.
And we're done.
Proposition 40: If a likelihood function is 0 when , and and , then
And then we apply Markov's inequality, that for any probability distribution,
Also, (because is 0 when ), so monotonicity means that
So, we can get:
And we're done.
Proposition 41: The IKR-metric is a metric.
So, symmetry is obvious, as is one direction of identity of indiscernibles (that the distance from an infradistribution to itself is 0). That just leaves the triangle inequality and the other direction of identity of indiscernibles. For the triangle inequality, observe that for any particular (instead of the supremum), it would fulfill the triangle inequality, and then it's an easy exercise for the reader to verify that the same property applies to the supremum, so the only tricky part is the reverse direction of identity of indiscernibles, that two infradistributions which have a distance of 0 are identical.
First, if , then and must perfectly agree on all the Lipschitz functions. And then, because uniformly continuous functions are the uniform limit of Lipschitz functions, and must perfectly agree on all the uniformly continuous functions.
Now, we're going to need a somewhat more sophisticated argument. Let's say that the sequence is uniformly bounded and limits to in equipped with the compact-open topology (ie, we get uniform convergence of to on all compact sets). Then, for any infradistributions, will limit to . Here's why. For any , there's some compact set that accounts for almost all of why a function inputted into an infradistribution has the value it does. Then, what we can do is realize that will, in the limit, be incredibly close to , due to and disagreeing by a bounded amount outside the set and only disagreeing by a tiny amount on the set , and the Lipschitzness of .
Further, according to this mathoverflow answer, uniformly continuous functions are dense in the space of all continuous functions when is equipped with the compact-open topology, so given any function , we can find a sequence of uniformly continuous functions limiting to in the compact-open topology, and then,
And so, and agree on all continuous functions, and are identical, if they have a distance of 0, giving us our last piece needed to conclude that is a metric.
Proposition 42: The IKR-metric for infradistributions is strongly equivalent to the Hausdorff distance (w.r.t. the KR-metric) between their corresponding infradistribution sets.
Let's show both directions of this. For the first one, if the Hausdorff-distance between is , then for all a-measures in , there's an a-measure in that's only or less distance away, according to the KR-metric (on a-measures).
Now, by LF-duality, a-measures in H correspond to hyperplanes above . Two a-measures being apart means, by the definition of the KR-metric for a-measures, that they will assign values at most distance apart for 1-Lipschitz functions in .
So, translating to the concave functional view of things, and being apart means that every hyperplane above h has another hyperplane above that can only differ on the 1-Lipschitz 1-bounded functions by at most , and vice-versa.
Let's say we've got a Lipschitz function . Fix an affine functional/hyperplane that touches the graph of at . Let's try to set an upper bound on what can be. If is 1-Lipschitz and 1-bounded, then we can craft a above that's nearby, and
Symmetrically, we can swap and to get , and put them together to get:
For the 1-Lipschitz functions.
Let's tackle the case where is either more than 1-Lipschitz, or strays outside of . In that case, is 1-Lipschitz and bounded in . We can craft a that only differs on 1-Lipschitz functions by or less. Then, since, for affine functionals, and using that and are close on 1-Lipschitz functions, which and 0 are, we can go:
And then we swap out for with a known penalty in value, we're taking an overestimate at this point.
This argument works for all . And, even though we just got an upper bound, to rule out being significantly below , we could run through the same upper bound argument with instead of , to show that can't be more than above .
So, for all Lipschitz , . Thus, for all Lipschitz ,
This establishes one part of our inequalities. Now for the other direction.
Here's how things are going to work. Let's say we know the IKR-distance between and . Our task will be to stick an upper bound on the Hausdorff-distance between and . Remember that the Hausdorff-distance being low is equivalent to "any hyperplane above has a corresponding hyperplane above that attains similar values on the 1-or-less-Lipschitz functions".
So, let's say we've got , and a . Our task is, knowing , to craft a hyperplane above that's close to on the 1-Lipschitz functions. Then we can just swap and , and since every hyperplane above is close (on the 1-Lipschitz functions) to a hyperplane above , and vice-versa, and can be shown to be close. We'll use Hahn-Banach separation for this one.
Accordingly, let the set be the set of where , and:
That's... quite a mess. It can be thought of as the convex hull of the hypograph of , and the hypograph of restricted to the 1-Lipschitz functions in and shifted down a bit. If there was a that cuts into and scores lower than it, ie , we could have , and to observe that cuts into the set . Conversely, if an affine functional doesn't cut into the set , then it lies on-or-above the graph of .
Similarly, if undershoots over the 1-or-less-Lipschitz functions in , it'd also cut into . Conversely, if the hyperplane doesn't cut into , then it sticks close to over the 1-or-less-Lipschitz functions.
This is pretty much what is doing. If we don't cut into it, we're above and not too low on the functions with a Lipschitz norm of 1 or less.
For Hahn-Banach separation, we must verify that is convex and open. Convexity is pretty easy.
First verification: Those numbers at the front add up to 1 (easy to verify), are both in (this is trivial to verify), and + isn't 1 (this is a mix of two numbers that are both below , so this is easy). Ok, that condition is down. Next up: Is our mix of and 1-Lipschitz and in ? Yes, the mix of 1-Lipschitz functions in that range is 1-Lipschitz and in that range too. Also, is our mix of and still in ? Yup.
That leaves the conditions on the b terms. For the first one, just observe that mixing two points that lie strictly below (a hyperplane) lies strictly below it as well. For the second one, since is concave, mixing two points that lie strictly below its graph also lies strictly below its graph. Admittedly, there may be divide-by-zero errors, but only when is 0, in which case, we can have our new and be anything we want as long as it fulfills the conditions, it still defines the same point (because that term gets multiplied by 0 anyways). So is convex.
But... is open? Well, observe that the region under the graph of on is open, due to Lipschitzness of . We can wiggle and around a tiny tiny little bit in any direction without matching or exceeding the graph of . So, given a point in , fix your tiny little open ball around . Since can't be 1, when you mix with , you can do the same mix with your little open ball instead of the center point, and it just gets scaled down (but doesn't collapse to a point), making a little tiny open ball around your arbitrarily chosen point in . So is open.
Now, let's define a that should be convex, so we can get Hahn-Banach separation going (as long as we can show that and are disjoint). It should be chosen to forbid our separating hyperplane being too much above over the 1-or-less Lipschitz functions. So, let be:
Obviously, cutting into this means your hyperplane is too far above over the 1-or-less-Lipschitz functions in . And it's obviously convex, because 1-or-less-Lipschitz functions in are a convex set, and so is the region above a hyperplane .
All we need to do now for Hahn-Banach separation is show that the two sets are disjoint. We'll assume there's a point in both of them and derive a contradiction. So, let's say that is in both and . Since it's in ,
But also, with the 's and 's and fulfilling the appropriate properties, because it's in . Since and , we'll write as and as , where and are nonzero. Thus, we rewrite as:
We'll be folding into a single term so I don't have to write as much stuff. Also, is an affine function, so we can split things up with that, and make:
Remember, because . So, we get:
And, if , we get a contradiction straightaway because the left side is negative, and the right side is nonnegative. Therefore, , and we can rewrite as:
And now, we should notice something really really important. Since can't be , does consistute a nonzero part of , because .
However, is a 1-or-less Lipschitz function, and bounded in , due to being in ! If wasn't Lipschitz, then given any slope, you could find areas where it's ascending faster than that rate. This still happens when it's scaled down, and can only ascend or descend at a rate of 1 or slower there since it's 1-Lipschitz as well. So, in order for to be 1-or-less Lipschitz, must be Lipschitz as well. Actually, we get something stronger, if has a really high Lipschitz constant, then needs to be pretty high. Otherwise, again, wouldn't be 1-or-less Lipschitz, since of it is composed of , which has areas of big slope. Further, if has a norm sufficiently far away from 0, then needs to be pretty high, because otherwise f wouldn't be in , since of it is composed of which has areas distant from 0.
Our most recent inequality (derived under the assumption that there's a point in and ) was:
Assuming hypothetically were were able to show that
then because , we'd get a contradiction, showing that and are disjoint. So let's shift our proof target to trying to show
Let's begin. So, our first order of business is that
This should be trivial to verify, remember that .
Now, , and is 1-Lipschitz, and so is . Our goal now is to impose an upper bound on the Lipschitz constant of . Let us assume that said Lipschitz constant of is above 1. We can find a pair of points where the rise of from the first point to the next, divided by the distance between the points is exceptionally close to the Lipschitz constant of , or equal. If we're trying to have slope up as hard as it possibly can while mixing to make , which is 1-Lipschitz, then the best case for that is one where is sloping down as hard as it can, at a rate of -1. Therefore, we have that
Ie, mixing sloping up as hard as possible and sloping down as hard as possible had better make something that slopes up at a rate of 1 or less. Rearranging this equation, we get:
We can run through almost the same exact argument, but with the norm of . Let us assume that said norm is above 1. We can find a point where attains its maximum/minimum, whichever is further from 0. Now, if you're trying to have be as negative/positive as it possibly can be, while mixing to make , which lies in , then the best case for that is one where is as positive/negative as it can possibly be there, ie, has a value of -1 or 1. In both cases, we have:
Now we can proceed. Since we established that all three of these quantities (1, Lipschitz constant, and norm) are upper bounded by , we have:
And we have exactly our critical
inequality necessary to force a contradiction. Therefore, and must be disjoint. Since is open and convex, and is convex, we can do Hahn-Banach separation to get something that touches and doesn't cut into .
Therefore, we've crafted a that lies above , and is within of over the 1-or-less-Lipschitz functions in , because it doesn't cut into and touches .
This same argument works for any , and it works if we swap and . Thus, since hyperplanes above the graph of an infradistribution function or correspond to points in the corresponding and , and we can take any point in /affine functional above and make a point in /affine functional above (and same if the two are swapped) that approximately agree on , there's always a point in the other infradistribution set that's close in KR-distance and so and have
And with that, we get
And we're done! Hausdorff distance between sets is within a factor of 2 of the IKR-distance between their corresponding infradistributions.
Proposition 43: A Cauchy sequence of infradistributions converges to an infradistribution, ie, the space is complete under .
So, the space of closed subsets of is complete under the Hausdorff metric. Pretty much, by proposition 42, a Cauchy sequence of infradistributions in the IKR-distance corresponds to a Cauchy sequence of infradistribution sets converging in Hausdorff-distance, so to verify completeness, we merely need to double-check that the Hausdorff-limit of the sets fulfills the various different properties of an infradistribution. Every point in , the limiting set, has the property that there exists some Cauchy sequence of points from the sets that limit to it, and also every Cauchy sequence of points from the sets has its limit point be in .
So, for nonemptiness, you have a sequence of nonempty sets of a-measures limiting to each other in Hausdorff-distance, so the limit is going to be nonempty.
For upper completion, given any point , and any a-measure, you can fix a Cauchy sequence limiting to , and then consider the sequence , which is obviously Cauchy (you're just adding the same amount to everything, which doesn't affect the KR-distance), and limits to , certifying that , so is upper-complete.
For closure, the Hausdorff limit of a sequence of closed sets is closed.
For convexity, given any two points and in , and any , we can fix a Cauchy sequence and converging to those two points, respectively, and then consider the sequence , which lies in (due to convexity of all the ), and converges to , witnessing that this point is in , and we've just shown convexity.
For normalization, it's most convenient to work with the positive functionals, and observe that, because all the and all the because of normalization, the same property must apply to the limit, and this transfers over to get normalization for your infradistribution set.
Finally, there's the compact-projection property. We will observe that the projection of the a-measures in to just their measure components, call the set , must converge in Hausdorff-distance. The reason for this is because if they didn't, then you could find some and arbitrarily late pairs of inframeasures where and have Hausdorff-distance , and then pick a point in (or ) that's KR-distance away from the other projection. Then you can pair that measure with some gigantic term to get a point in (or , depending on which one you're picking from), and there'd be no point in (or ) within distance of it, because the measure component would only be able to change by if you moved that far, and you need to change the measure component by to land within (or ).
Because this situation occurs infinitely often, it contradicts the Cauchy-sequence-ness of the sequence, so the projections must converge in Hausdorff distance on the space of measures over . Further, they're precompact by the compact-projection property for the (which are infradistributions), so their closures are compact. Further, the Hausdorff-limit of a series of compact sets is compact, so the Hausdorff limit of the projections (technically, their closures) is a compact set of measures. Further, any sequence which converges to some , has its projection being , which limits to show that is in this Hausdorff limit. Thus, all points in project down to be in a compact set of measures, and we have compact-projection for , which is the last condition we need to check to see if it's an infradistribution.
So, the Hausdorff-limit of a Cauchy sequence of infradistribution sets is an infradistribution set, and by the strong equivalence of the infra-KR metric and Hausdorff-distance, a Cauchy limit of the infra-KR metric must be an infradistribution, and the space is complete under the infra-KR metric.
Proposition 44: If a sequence of infradistributions converges in the IKR distance for one complete metric that is equipped with, it will converge in the IKR distance for all complete metrics that could be equipped with.
So, as a brief recap, could be equipped with many different complete metrics that produce the relevant topology. Each choice of metric affects what counts as a Lipschitz function, affecting the infra-KR metric on infradistributions, as well as the KR-distance between a-measures, and the Hausdorff-distance. So, we need to show that regardless of the metric on , a sequence of convergent infradistributions will still converge. Use for the original metric on and for the modified metric on , and similarly, and for the KR-metrics on measures, and for the Hausdorff distance induced by the two measures.
Remember, our infradistribution sets are closed under adding to them, and converge according to to the set .
What we'll be doing is slicing up the sets in a particular way. In order to do this, the first result we'll need is that, for all , the set
converges, according to , to the set
So, here's the argument for this. We know that the projection sets
are precompact, ie, have compact closure, and Hausdorff-limit according to to the set
(well, actually, they limit to the closure of that set)
According to our Lemma 3, this means that the set
(well, actually, its closure) is a compact set in the space of measures. Thus, it must have some maximal amount of measure present, call that quantity , the maximal Lipschitz constant of any of the infradistributions in the sequence. It doesn't depend on the distance metric is equipped with.
Now, fix any . There's some timestep where, for all greater timesteps, .
Now, picking a point in with , we can travel distance according to and get a point in , and the term can only change by or less when we move our a-measure a little bit, so we know that our nearby point lies in
But, what if our point in has ? Well then, we can pick some arbitrary point (by normalization for ), and go:
And then we have to be a little careful. by assumption. Also, we can unpack the distance to get
And the worst-case for distance, since all the measures have their total amount of measure bounded above by , would be being 1 on one of the measures and -1 on another one of the measures, producing:
So, the distance from to
according to is at most
And then, because this point has a value of at most
Because , the value upper bound turns into
Which is a sufficient condition for that mix of two points to be only distance from a point in with a upper bound on the term, so we have that the distance from
is at most