# 4

Formal ProofAI
Frontpage

We're going to need to go out of order here a bit and do Proposition 31 before doing Propositions 29 and 30. Please have patience.

Proposition 31: All five characterizations of the pullback define the same infradistribution.

So, the two requirements of being able to do a pullback  are that:  has  as a support, and  is a proper function (preimage of a compact set is compact). And our five characterizations are:

1: For sets,

ie, take the preimage of  w.r.t. the linear operator that  induces from .

2: For concave functionals,  is the concave monotone hull of the partial function .

3: For concave functionals,  is the monotone hull of the partial function , ie:

4: For concave functionals,

5: For sets, take the closed convex hull of all infradistribution sets which project down to be equivalent to .

The proof path proof path will be first establishing that 3 makes an infradistribution, then showing that 3 and 2 are equivalent, then showing that 2 and 4 are equivalent, then showing that 4 and 5 are equivalent. And now that 2, 3, 4, and 5 are in an equivalence class, we wrap up by showing that 2 implies 1 and 1 implies 5.

First up: Establishing that 3 makes an infradistribution, which is where the bulk of the work is.

Starting with normalization:

And clearly, this supremum is attained by , it can be no higher, by normalization, so we have  and thus one part of normalization. For the other half of normalization,

Now, in order for  to be the case, all we need is that  be 0 on . However, since  is assumed to be a support for , all functions like this must have  and we have the other part of normalization.

Monotonicity: This is immediate because if , then the supremum for  has more options and can attain a higher value.

Concavity:

And we can pick some  and  which very nearly attain the relevant values, as close as you wish, to go:

And then,

So, with that, we can go:

And we're done with concavity.

For Lipschitzness, the proof will work as follows. We'll take two functions  and , and show that  in a way that generalizes to symmetrically show that  to get our result that

where  is the Lipschitz constant of .

So, let's show

What we can do is, because

then we can consider this value to be approximately obtained by some specific function . Then,

The first two inequalities are from monotonicity. The third inequality is because , where  is our special function from a bit earlier, because . So, it isn't necessarily the supremum and doesn't attain maximum value. Then, there's just Lipschitzness of , and  being approximately . And we're done with Lipschitzness, just swap  and  and use earlier arguments.

Before proceeding further, note the following information: Polish spaces are first-countable and Hausdorff so they're compactly generated, and Wikipedia says any proper map to a compactly generated space is closed (maps closed sets to closed sets). So,  is a closed set.

Now just leaves compact almost-support. Fix an , and take a compact -almost-support of . Intersect with  if you haven't already, it still makes a compact set and it keeps being an -almost-support. We will show that , which is compact because we assumed our map was proper, is an -almost-support of .

This proof will proceed by showing that, if  agree on , then . Symmetrical arguments work by swapping  and , showing that the two expectation values are only  apart.

As usual, we have

Where  very nearly attains the supremum, and we know that .

Let the function  (a set-valued function) be defined as:

If , then

If , then

These sets are always concave, and also always nonempty, because, in the second case, the only way it could be nonempty is if

Now, g being a proper map, the preimage of the single point  is a compact set, so we can actually minimize our function, pick out our particular  that maps to . Then, we know:

The second inequality is because  by assumption. Shuffling this around a bit, we have:

where our  is minimizing, which is incompatible with nonemptiness.

So, we've got convexity and nonemptiness. Let's shower lower-hemicontinuity.

As a warmup for that, we'll need to show that the function
Is lower-semicontinuous.

So, our proof goal is: if  limits to , then

And we know that all our  and  itself, lie in .

At this point, here's what we do. Isolate the subsequence of points we're using to attain the . This sequence of points  (when unioned with  itself) is compact in , so its preimage in  is compact. And you can minimize functions over compact sets. So, for each , pick a point  that minimizes . Our sequence of points  is roaming around in a compact set, so we can pick a convergent subsequence. At this point, let's start indexing with . Said limit point  must have  by continuity for , and  by continuity for  (all the  get mapped to , so the limit point  must go to the limit point ) Putting it all together,

And so we have that, if  limits to , then

And  is lower-semicontinuous over .

At this point, we'll try to show that the set-valued function  is lower-hemicontinuous over . We'll use z to denote numbers in .

The criterion for lower-hemicontinuity of  is: If  limits to , and , then there's a subsequence  and points  where

To show this, we'll divide into three cases. In our first case, only finitely many of the  are in . Let's pass to a subsequence where we remove all the  that lie in , and where  converges to a value. Let

Then,

This was because all these sequences have their corresponding values converge, whether by continuity or assumption. So, this takes care of lower-hemicontinuity in one case.

Now for our second case, where only finitely may of the  aren't in . Pass to a subsequence where they are all removed, and where  converges to a value. Accordingly, all our   lie in . Let . Then

Our third case is the trickiest, where there are infinitely many  in , and infinitely many outside that set, and accordingly,  Let's pass to a subsequence where we remove all the  that lie in  and where  converges to a value. We would be able to apply the proof from the first case if we knew that

Obviously, we have , but we still need to show that

and

The second one can be addressed trivially, so now we just have to show that

Pick some minimizing , where , and remember that this  lies in , so , the compact set where . Now we can just go:

And we're done. That's the last piece we need to know that  is lower-hemicontinuous.

Now we will invoke the Michael selection theorem, which gets us a continuous selection  (bounded because  and  are) where, for all .

The conditions to invoke the Michael selection theorem are that  is lower-hemicontinuous (check), that  is paracompact, and that  is a Banach space (check). Fortunately, any metrizable space is paracompact, and Polish spaces are metrizable, so we can invoke the theorem and get our continuous .

Note two important things about this continuous  we just constructed. First, it perfectly matches  over . Second, for all  in  (remember, that's the space it's defined over)

Because of that worst-case bound thing being one of the upper bounds on . Also, over , it remains within  of , again, because of the imposed bounds.

We can use the Tietze extension theorem to extend  to all of  while staying within  of .

At this point, we can return to where we started out in this section, and go:

The first equality was the definition of the preimage, the approximate equality was because  is very very near attaining the true supremum, the  is because  and  agree on  which is an -almost-support for , the next inequality after that is because we just showered that , and then packing up the definition of preimage.

Perfectly symmetric arguments work when you swap  and , yielding our net result that

And  were selected to be arbitrary continuous functions that agree on , which is compact, so we have a compact -almost-support, so  has compact almost-support and is an infradistribution.

Ok, so we're done, we've defined a pullback, it's the monotone hull of the function  under precomposition with , and this was definition 3. Now it's time to start showing equivalences.

First, the equality of definitions 2 and 3. The concave monotone hull of the partially defined pullback must be  than the monotone hull of the partially defined pullback, because we're imposing an additional condition, there are less functions that fit the bill. But, we've showed that the monotone hull is concave, so it's actually an equality.

Now for the equality of of 2 and 4, that the concave monotone hull of the pullback is the same as the inf of all infradistributions that produce  when pushfoward is applied.

We know that the concave monotone hull is indeed an infradistribution, and

So, since its projection forward matches  perfectly, we have

For actual equality, we note that, if , then

And since  is the concave monotone hull of the partial function , and the concave monotone hull is the least concave monotone function compatible with said partial function, and any  which pushes forward to map to  must be concave and monotone and match this partial function, all the  we're inf-ing together lie above , so we have actual equality and

So, at this point, definitions 2, 3, and 4 of the concave hull are equal.

Now for equality of definitions 4 and 5. Taking the inf of infradistributions on the concave functional level is the same as unioning together all the infradistribution sets, and taking the closed concave hull (because that doesn't affect the expectation values), so "inf of infradistributions that pushfoward to make ", on the concave functional side of LF-duality, is the same as "closed concave hull and upper completion of infradistributions which pushforward to make " on the set side of LF-duality.

Now that we know definitions 2-5 of the pullback are all equal, and using  for the expectation functionals corresponding to definitions 3, 1, and 5 of the pullback, we'll show , so , ie, the first definition of the pullback, must be equal to the rest.

First,  is the functional corresponding to the set:

Where  in this case is the function of type signature  given by keeping the  term the same and pushing forward the signed measure component via the function . Note that said pushfoward can actually only make points in , the measures can only be supported over that set.

We'll show that

is monotone and that

To begin with, for monotonicity, if , then

This is easy because our set is entirely of a-measures. So the corresponding  is monotone.

Now, at this point, we can go:

The next line is because  (the continuous pushfoward of a measure, evaluated via ) is the same as  (old measure evaluating )

The next line is a bit tricky, because pushforward seems like it may only net some of the points in H. However, any measure supported over  has a measure over  that, when pushfoward is applied, produces it. Just take each point , craft a Markov Kernel from  to  that maps all  to a probability distribution supported over , take the semidirect product of your measure over  and your kernel, and project to  to get such a measure. So, said pushfoward surjects onto , and we can get:

So, now that we've shown that

is monotone and that

We know that the corresponding functional  has the property that it's monotone and . Thus, it must lie above the monontone hull of , which is just  (third definition of the pullback) and we have: .

Now for the other direction, showing that . The fifth definition of the pullback was the closed convex hull (and upper completion) of all the infradistributions which project to . So, if we can show that any infradistribution set  which projects into  must have its minimal points lie in , we'll have that all infradistribution which project to  are a subset of , and thus is higher in the ordering on functionals, .

This is trivial by  just being a preimage, so any set which pushforwards into the set of interest must be a subset of the preimage.

And we have our desired result that any infradistribution set which projects into  must be a subset of  (all minimal points are present in it), and higher in the information ordering, so the inf of all of them, ie, the functional , must lie above the corresponding functional .

So, we have  and so  equals all of them, and the restricted preimage of H is another formulation of the pullback, and we're done.

Proposition 29: The pullback of an infradistribution is an infradistribution, and it preserves all properties indicated in the table for this section.

So, we know from the proof of Proposition 31 that the pullback of an infradistribution is an infradistribution. Now, to show preservation of properties, we'll work with the formulation of the pullback as:

For homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness, they are all characterized by all minimal points having certain properties of their  and  values. All minimal points of  must pushforward to minimal points of  (otherwise the  term of your original minimal point could be strictly decreased and it'd still pushforward into ), which have the property of interest, and because pushforward  preserves the  and  values of a-measures, we know that all minimal points of  have the same property on their  and , so all these properties are preserved under pullback.

Sharpness takes a slightly more detailed argument. Let's say that  is sharp, ie, its minimal points consist entirely of the probability distributions on . We know from crispness preservation that  has its minimal points consist entirely of probability distributions. If there was a probability distribution that wasn't supported on  (preimage of , which, by the assumed properness of , is compact), that was present as a minimal point, then it would pushfoward to make a minimal point in  that was a probability distribution that had some measure outside of , which is impossible. Further, any probability distribution that is supported on  would pushfoward to make a probability distribution supported on , which would be a minimal point of , so said point must be minimal in . Thus, the minimal points of  consist entirely of probability distributions supported on , so  is sharp, and all nice properties are therefore preserved.

Proposition 30: Pullback then pushforward is identity, , and pushforward then pullback is below identity,

The first part is easy to show,

Yielding pullback then pushforward being identity. For the reverse direction, we use Proposition 31, in particular the part about the pullback being the least infradistribution that pushes forward to produce the target. And we have , therefore

Proposition 32: The free product of two infradistributions exists and is an infradistribution iff:
1: There are points  and  where
2: There are points  and  where , and , and .

So, to show this, what we'll be doing is using our set-form characterization of the pullback as a preimage and the supremum as an intersection. We already know that, from proposition 14, a necessary condition for the intersection of infradistributions being an infradistribution is normalization, ie, on the set level, the presence of a point with  and a point with . The presence of a point with  in the intersection of the preimages would pushforward to make a  point in  and , respectively, with equal  values, fulfilling condition 1.

Also, the presence of a point with  in the intersection of the preimages would pushforward to make a  minimal point in  and  respectively, with equal  values and  values, fufilling condition 2.

Thus, if the free product exists and is an infradistribution, the two properties are fulfilled. That just leaves the reverse direction of establishing that the free product exists and is an infradistribution if both properties are fulfilled.

We'll be handling the set conditions on an infradistribution. For nonemptiness and normalization, observe that if condition 1 is fulfilled, you can make the a-measure  which projects down to  and , so it's present in the intersection of the preimages of  and  under the projection mappings. This witnesses nonemptiness and one half of normalization.

For the second half of normalization, any point with  would project down to make a point outside of  and , so it can't exist. Further, if condition 2 is fulfilled, you can make the a-measure , with , and it projects down to  and  in  and  respectively, witnessing that it's present in the free product, and witnessing that the other half of normalization works out.

For closure and convexity and upper-completeness, the intersection of two closed convex upper-complete sets (the pullback of closed convex upper-complete sets along a continuous linear mapping that preserves the  term) is closed-convex upper-complete, so that works out.

That leaves the compact-projection property, which is the only one where we may have issues. Remember, the necessary-and-sufficient conditions for compactness of a set of measures are: bounded  value, and, for all , there's a compact subset of  where all the measure components of the points have all but  measure on said set.

Any point in the intersection of preimages with the measure component having a  value above the minimum of the two  and  values (maximum amount of measure present in points in  and  respectively) would project down to not be in  or , respectively, so those are ruled out, and we've got a bound on the amount of measure present.

As for an -almost-support for all the measure components in , what you do is find a  compact -almost-support for  and a  compact -almost-support for , and consider

We claim this is a compact (product of compact sets) -almost-support for all measure components in .

This is because any measure component in  must project down to be a measure component of  and a measure component of . So, if there was more than  measure outside

then it would project down to either  or  and have more than  measure outside of the two compact sets, which is impossible by the projection landing in  (or ) and the two compact sets being -almost-supports for  and  respectively.

For any  we can make a compact subset of  that accounts for all but  of the measure present of a measure in , so it's got the compact-projection property, which is the last thing we need to certify it's an infradistribution.

Proposition 33: Having (homogenity or 1-Lipschitzness) present on both of the component infradistributions is a sufficient condition to guarantee the existence of the free product.

From Proposition 32, all we need to do is to verify properties 1 and 2, ie
1: There are points  and  where
2: There are points  and  where , and , and .

Given only that (without loss of generality)  is 1-Lipschitz or homogenous, and same for , we'll show properties 1 and 2.

The trick to this is showing that 1-Lipschitzness and homogenity both imply that an infradistribution contains some point with  and , and if both infradistributions fulfill that, then conditions 1 and 2 are fulfilled for free, as can be verified with inspection.

So, let's show that 1-Lipschitzness and homogenity imply there's a point with  and . For 1-Lipschitzness, , and since there must be a point with  by normalization, and all points have  by normalization, said point must have . As for homogenity, all minimal points have , and normalization says there's a point with , getting that there's a point with . So, we're done.

Proposition 34:  and , with equality when  and  are C-additive.

And same for . For equality in the C-additive case, we need to show that every point in  has some point in  which surjects onto it. Remember, C-additivity is "all minimal points have ", which, by our notion of upper-completion, is equivalent to all measure components in the set just being ordinary probability distributions.

The existence of minimal points with  means that, given any point , you can find a point with the same  value in , call it , and then  because it projects down into both sets, showing that  surjects onto  when projected down and so must have equal expectation value. Same for .

Proposition 35: Free product is commutative and associative.

This is because supremum is commutative (order doesn't matter) and associative (parentheses don't matter), because on the set level it's just intersection, which is order-and-grouping agnostic.

Proposition 36: The free product of countably many infradistributions exists and is an infradistribution iff:
1: There are points  where,
2: There are points  where

We recap the proof of Proposition 32. The presence of a point with  or  in a hypothetical infinite free product must project down into all the , and cause the fulfillment of conditions 1 and 2.

For the set conditions on an infradistribution, for nonemptiness and normalization, observe that if condition 1 is fulfilled, you can make the a-measure  which is present in the intersection of all the a-measure pullbacks. This witnesses nonemptiness and one half of normalization.

For the second half of normalization, any point with  would project down to make a point outside of all the  so it can't exist. Further, if condition 2 is fulfilled, you can make the a-measure , with , and it is present in the intersection of all the a-measure pullbacks, witnessing that the other half of normalization works out.

For closure and convexity and upper-completeness, the intersection of closed convex upper-complete sets (the pullback of closed convex upper-complete sets along a continuous linear mapping that preserves the  term) is closed-convex upper-complete, so that works out.

That leaves the compact-projection property, which is the only one where we may have issues. For Lipschitzness, we remember that any point in the intersection of preimages with the measure component having a  value above  has some  that it can't project down into, so we get a contradiction, and we've got a bound on the amount of measure present.

As for an -almost-support for all the measure components, you biject your  with the natural numbers, and find a compact set  which is a -almost-support for  and take the product of them. It's compact by Tychonoff, and because each speck of measure of the measure component of a point in the infinite free product that doesn't land in this set projects down to be outside of one of the  sets, the total amount of measure outside this set is upper-bounded by:

So it's a compact -almost-support for the measure components of all a-measures in the infinite free product, and this can work for any , so the infinite free product fulfills the compact almost-support property and has compact projection, which is the last condition we needed.

Proposition 37: , with equality when all the  are C-additive.

For equality in the C-additive case, we need to show that every point in  has some point in  which surjects onto it. Remember, C-additivity is "all minimal points have ", which, by our notion of upper-completion, is equivalent to all measure components in the set just being ordinary probability distributions.

The existence of minimal points with  means that, given any point , you can find a point with the same  value in  (regardless of ), call it , and then  because it projects down into all sets, showing that  surjects onto  when projected down and so must have equal expectation value.

Proposition 38: Ultradistributions  are isomorphic to convex, monotone, normalized, Lipschitz, CAS, functions .

We'll be going over the proof of Theorem 1 again, but with some stuff swapped around.

Our first order of business is establishing the isomorphism. Our first direction is  to  and back is  exactly. By lower completion, and reproved analogues of Proposition 2 and Theorem 1 from "Basic inframeasure theory", which an interested party can reprove if they want to see it, we can characterize  as

And then, our  can further be reexpressed as

and, by the definition of the convex conjugate, and the space of finite signed measures being the dual space of , and  being a functional applied to an element, this is...

So, our original set  is identical to the convex-conjugate set, when we go from  to  back to a set of a-measures.

Proof Phase 2: In the reverse direction for isomorphism, assume that  fulfills the conditions (we'll really only need continuity and concavity)We want to show that

Let's begin.

Given an , we have a natural candidate for maximizing the , just set it equal to .

So then we get

And this is just... , and, because  is continuous over , and convex, . From that, we get

and we're done with isomorphism.

So, in our first direction, we're going to derive the conditions on the functional from the condition on the set, so we can assume nonemptiness, closure, convexity, upper completion, projected-compactness, and normalization, and derive monotonicity, concavity, normalization, Lipschitzness, and compact almost-support (CAS) from that.

For monotonicity, remember that all points in the infradistribution set are a-measures, so if , then

We could do that because all the measure components are actual measures.

For convexity

And we're done with that. For normalization,

And

So we have normalization.

For Lipschitzness, we first observe that compact-projection (the minimal points, when projected up to their measure components, make a set with compact closure) enforces that there's an upper bound on the  value of a minimal point , because otherwise you could pick a sequence with unbounded , and it'd have no convergent subsequence of measures, which contradicts precompactness of the minimal points projected up to their measure components.

Then, we observe that points in  correspond perfectly to hyperplanes that lie below the graph of , and a maximal point is "you shift your hyperplane up as much as you can until you can't shift it up any more without starting to cut into the function ". Further, for every function , you can make a hyperplane tangent to the function  at that point by the Hahn-Banach theorem, which must correspond to a maximal point.

Putting it together, the epigraph of  is exactly the region above all its tangent hyperplanes. And we know all the tangent hyperplanes correspond to maximal points, and their Lipschitz constants correspond to the  value of the minimal points. Which are bounded. So, Compact-Projection in H implies h is Lipschitz.

Finally, we'll want compact almost-support. A set of measures is compact iff the amount of measure is upper-bounded, and, for all , there is a compact set  where all the measures  have  measure outside of .

So, given that the set of measures corresponding to  is compact by the compact-projection property, we want to show that the functional  has compact almost-support. To do this, we'll observe that if  is the sup of a bunch of functions, and all functions think two different points are only a little ways apart in value, then h must think they're only a little distance apart in value. Keeping that in mind, we have:

And then, we can think of a minimal point as corresponding to a hyperplane  and , and  is the sup of all of them, so to bound the distance between these two values, we just need to assess the maximum size of the gap between those values over all minimal points/tangent hyperplanes. Thus, we can get:

And then, we can do some canceling and get:

And then, because  was selected to be 0 on