Proofs Theorem 5

by Diffractor21 min read3rd Dec 2021No comments

3

Theorem 5: Causal Commutative Square: The following diagram commutes for any choice of causal . Any choice of infradistribution  where  also makes this diagram commute.

Again, like usual, this proof consists of doing preliminary work (section 0) where we clarify some stuff and find 4 conditions for the 4 corners that correspond to "maximal representative of a causal belief function". Section 1.1-8 is showing that these properties are preserved under all eight morphisms, where as usual, the complicated part of translating between a belief function and infradistribution is shoved in the first two sections. Sections 2.1-8 are then showing the 8 identities needed to establish 4 isomorphisms assuming the relevant properties, and then section 3 is going in two directions from one corner to another and showing equality to rule out nontrivial automorphisms.

T5.0 First, preliminaries. We use suprema over: probability distributions  on the space of policies, and utility functions  we associate with each policy , which fulfill the following constraint for some function .

To avoid writing all this out each time, we observe that for the purposes of the proof, all that varies in this format is the function  that we're using as an upper-bound, we always select probability distributions over policies and utility functions linked with the same policy, and impose upper bounds of that "for all copolicies" form on a mix of the relevant utility functions interacting with the copolicy. So we'll suppress all this in the notation and just say

as our abbreviation, with  for when we need a function to have an empty slot for copolicies. So something like  would be "the function from copolicies to  given by letting  interact with the copolicy to get a history, and then applying  to it".

The defining property for the third-person static view is that  is an infradistribution, and, for all functions 

The property for the third-person dynamic view is that  is an infradistribution, and  and 

The property for the first-person static view is that  be a belief function, and that it fulfill causality, that 

The property for the first-person dynamic view is that  is an infradistribution, and  and 

Before showing that these properties are preserved under morphisms, it will be handy to have an equivalent rephrasing of causality for belief functions.

The causality condition is:

We can unconditionally derive the statement

because, letting  be the dirac-delta distribution on , and , we trivially fulfill the necessary condition for the supremum, that 

So, we get the fact

With this knowledge, the causality condition holds iff

and this holds iff

In the first direction, this happens because any particular choice of  and  which fulfills the implication antecendent must have the expectation of  doing worse than the supremum of such expectation, which still loses to .

In the other direction, this happens because the supremum only ranges over distributions on policies and choices of utility function that fulfills the implication antecedent, so the consequent always kicks in. Thus, even the largest possible expectation of  isn't above 

Thus, we will be taking 

as an alternative phrasing of causality for belief functions.

Time to show that these properties are preserved under the morphisms. As usual, the hard part is showing property preservation between first-person static and first-person dynamic.

T5.1.1 3-static to 1-static. Our translation is

and we must check the five conditions of a belief function, as well as causality. So the proof splits into six parts for those six desired results. The five conditions for a belief function are a bounded Lipschitz constant, lower-semicontinuity, normalization, sensible supports, and agreement on max value. We'll just be using that  is an infradistribution, and not that it has any additional properties beyond that. The pushforward of an infradistribution is always an infradistribution, so we know that  is always an infradistribution.

T5.1.1.1 For a bounded Lipschitz constant, we have

and then, since  has some Lipschitz constant , we then have

And, for any  pair, it induces a history, so this is upper-bounded by

So a uniform Lipschitz constant upper bound is shown.

T5.1.1.2 For lower-semicontinuity, it's actually easy to show for once. We get something even stronger. All causal belief functions aren't just lower-semicontinuous in their first argument, they're continuous. The proof is:

Now, the thing to realize here is that  uniformly converges to . Why does this happen? Well, once  gets close enough to , it will perfectly agree with  about what to do in any situation for the first m steps. So, vs any copolicy at all, the histories  (for all sufficiently late n) and  will agree for the first m steps. And, since  is continuous on  which is a compact space, it must be uniformly continuous. So for any  difference in output, there's some m where knowing the first m steps of history lets you pin down the value of  to within .

Then, we can just appeal to the finite Lipschitz constant of  to conclude that, since we have uniform convergence of functions, we have

which then packs back up in reverse as

That was very easy compared to our typical lower-semicontinuity proofs!

T5.1.1.3 For normalization, we have

And the same goes for 0, because  is an infradistribution.

T5.1.1.4 For sensible supports, assume  agree on histories that can be produced by . Then,

But then wait, for any  makes a history that can be made by , so then  must agree on it. Those two inner functions are equal, and so attain equal value, and we have .

T5.1.1.5 For agreement on max value, you can just go

and

and you're done.

T5.1.1.6 That just leaves checking causality. We'll make a point to not assume any special conditions on  at all in order to show this, because the

condition is actually saying that  is the maximal infradistribution that corresponds to a causal belief function, but it'll be useful later to know that any infradistribution on copolicies makes a causal belief function.

We'll be showing our alternate characterization of causality, that

So, pick an arbitrary , family of , utility function , and policy , and assume  Then, we can go:

and then, by applying concavity of infradistributions, we can move the expectation inside and the value goes up, and then we apply monotonicity via the precondition we assumed.

and then we just pack this back up.

and we have the causality condition shown.

T5.1.2 1-static to 3-static. Our translation is:

The way this proof works is first, we unpack  into a different form in preparation for what follows, that's phase 1. Then there's 6 more phases after that, where we show the five infradistribution conditions on  and the causality condition. Some of these are hard to show so we'll have to split them into more parts.

T5.1.2.1 We'll unpack this first, in preparation for showing the infradistribution properties and causality property for . We can go

and then express the intersection as a supremum over  and  such that  because that's just how intersections work. This is abbreviated as .

Then, for the pullback, we take the supremum over  where  for all . We'll write this as .

This makes

Now, these  always make an acceptable substitute for the  functions early on, as they're always lower than  regardless of copolicy. Further, if you're attempting to attain the supremum, you'll never actually use your , it will always be  being plugged in. So, we can just wrap up all the maximization into one supremum, to make

We'll be using this phrasing as a more explicit unpacking of .

Now, for the infradistribution conditions on , and the causality condition, there's a very important technicality going on. Intersection behaves much worse for the  type signature. In the  type signature, everything works out perfectly. However, in the  type signature, we'll be able to show all the needed properties for  except having a finite Lipschitz constant. But, not all is lost, because to compensate for that, we'll show that you can swap out  (almost an infradistribution) for an actual infradistribution  that induces the same belief function .

Like we said back in the post, it isn't terribly important that we use the true maximal representative , we just want some infradistribution in the same equivalence class.

T5.1.2.2 First up, monotonicity. Assume . Then, using our rewrite of  and , we have

and monotonicity is shown. This occurred because we're taking the supremum over probability distributions and choices of utility function where the mix underperforms  as an upper bound. Since  is lower, we're taking the supremum over fewer options.

T5.1.2.3 Now, concavity. We'll work in reverse for this.

Let's regard the supremum as being attained (or coming arbitrarily close to being attained) by some particular choice of , and families  and (which are upper-bounded by  respectively) to get

We can rewrite this as an integral to get

and then move the  and  in to get integration w.r.t. a measure.

And then, since  and  are both absolutely continuous w.r.t. , we can rewrite both of these integrals w.r.t the Radon-Nikodym derivative, as


and pack these up into one integral as the probability distribution we're integrating over is the same in both cases,


And then, because

(well, technically, this may not hold, but only on a set of measure 0 relative to , so we can fiddle with things a bit so this always holds), we can apply concavity of  because it's an inframeasure, for all , and get


and then pack it back up as an expectation, to get

We will now attempt to show that, for all copolicies ,


We can take that big expectation and write it as

and then write it as an integral


Split it up into two integrals


The Radon-Nikodym derivatives cancel to leave you with

and then pull out the constants to get

pack up as an expectation

And use that

to get an upper bound of

Ok, we're done. We now know that  is a probability distribution (mix of two probability distributions). We know that the expectation w.r.t. that distribution of the functions

undershoots . And since this is a mix of two existing utility functions, it'll be in  if the original utility functions were in that range already (or bounded if the originals were). Taking stock of what we've done so far, we've shown that

but the probability distribution, and functions used, ends up mixing to undershoot . So we can impose an upper bound of:

which then packs up to equal

and we're done, concavity has been shown.

T5.1.2.4 The compact-almost-support property on  is trivial since  is a compact space.

T5.1.2.5 For normalization, we'll need to do it in two parts, for both 0 and 1.

T5.1.2.5.1 First, we show normalization for 1, which requires showing both directions of the inequality. We start with

and a possible choice that the supremum ranges over is the dirac-delta distribution on the  with the worst value of  (which is 1, by normalization for belief functions), and all the  are 1. So then we get

A lower bound of 1. For getting an upper bound, we have

and then, the supremum ranges over  where 

Now, 1 can be written as  where  is the utility function which maps everything to 1, and  is the policy with the worst value of , and we can apply our alternate formulation of causality for  to guarantee that

regardless of our choice of . And so, we can impose an upper-bound of

and we're done, we've got inequalities going both way, so 

T5.1.2.5.2 Now for 0. We have

and one thing that the supremum ranges over is the dirac-delta on  which attains the worst value of , and all the  being 0, so we get

and we have a lower-bound of 0. For an upper bound,

and then, the supremum ranges over  where 

Now, 1 can be written as  where  is the utility function which maps everything to 1, and  is the policy with the worst value of , and we can apply our alternate formulation of causality for  to guarantee that

regardless of our choice of . And so, we can impose an upper-bound of

and we're done, we've got inequalities going both way, so .

T5.1.2.6 Now for Lipschitzness, which we'll need to split in two for the two type signatures.

T5.1.2.6.1 This will be completely provable for the  type signature on infradistributions. What we'll do is show that for any two functions , when is the upper bound on the Lipschitz constant for the belief function , we have

Since both  must be above , by monotonicity, this result would show that

and then we'd be done, that's Lipschitzness. So let's get working. We unpack as

Consider the supremum of the first term to be attained by some choice of  fulfilling the property that  and then we can rewrite as

Now, let your choice of  be , and your choice of the  family be . We do need to show that

This is easy, we can fix an arbitrary , and go

So, this choice of  and  works out. Now we can go

And then, by monotonicity and a shared Lipschitz constant for the , we have

And we've shown what we came here to show,

 were arbitrary, so by monotonicity for , we have

and  has a finite Lipschitz constant for the  type signature.

T5.1.2.6.2 Now for the fiddly bits with the  type signature. The reason the above  proof doesn't work for that type signature is that all our functions have to remain in , and it's possible that subtracting  from all the  results in functions that go outside of , and it's way way harder than it looks to patch this.

We have all the other properties, though. What unbounded Lipschitz constant for  (as an expectation functional) looks like from the set point of view is that we've got minimal points in the set of a-measures over  (our ) with unboundedly large measure components. However, all the  infradistribution sets have a bounded Lipschitz constant of , all their minimal points can have only  measure at most. 

If you took your set , and chucked out all the minimal points with measure greater than  and built up an infradistribution  from the remaining minimal points,  would produce  again if you went back to a belief function. Why?

Well, the minimal points of an inframeasure/distribution are the truly critical part of them. As long as two sets of a-measures agree on which a-measures are minimal points, they're completely 100 percent equivalent in all respects. We travel back to a belief function  via applying the pushforward . Doing this pushforward, in the set view keeps all the measure components of a-measures the same. So, chucking out all the a-measures in  with too much measure doesn't throw out any a-measures in  which are necessary to produce a minimal point in some  (all of those have  measure or less). When we apply the pushforward to  instead of , we get a smaller set, sure, but the set perfectly agrees with the pushforward of  on the minimal points, so the sets they produce are equivalent. You get the same belief function back.

T5.1.2.7 Now for the last part, showing that we get the causality condition on . Using our rewrite of , we have

and then, by causality for , we get

Note that there's a dependence of the second probability distribution and choice of utility functions on the the  selected, because we have a separate  for each . Since our rewrite of  was

we can abbreviate the interior of the big equation we have to get

Which is the causality condition for .

T5.1.3 For 3-static to 3-dynamic, it's easy, since  so it can just copy over the relevant causality property from , and we specified that the transition dynamics are exactly what we need, 

T5.1.4 For 3-dynamic to 3-static, this will be tricky, so first we set up some preliminaries. The translation process is unrolling the action-observation sequence to get an infradistribution over destinies, and then turning that into a belief function, and then turning that back into a set of copolicies. Ie,

We can save on some work. We recently established that going from a causal belief function to the space of copolicies produces an infradistribution over copolicies with the causality condition. So, actually, all we need to do is show that doing the first two steps (and skipping the third) makes a causal belief function, and we'll be done.

This is essentially going from third-person dynamic to third-person static (in destiny form), to first-person static, so we should expect it to be a bit hard. So our goal is that

ends up making a causal belief function.

Obviously, we need to show the causality property for . However, for all other belief function conditions, we get them for free as long as we have an actual infradistribution over histories (this can be shown by taking the arguments that you get a belief function from translating over an infradistribution back in the Acausal Square Theorem, and adapting them to the pseudocausal case, which is pretty easy). So, actually, all we need to show is that  is causal, and that  is an infradistribution over histories, assuming that  is an infradistribution over copolicies. 

So, the proof splits into four parts where we verify the niceness conditions on  in order to show that the above line is an infradistribution over histories, and one part where we verify causality for 

First up, verifying the niceness conditions for  so  is well-defined.

T5.1.4.1 We'll start with lower-semicontinuity, though actually, we'll be going for something stronger, continuity.

and then unpack the semidirect product and subsitute the dirac-delta values in to get

and unpack what  means to get

There are finitely many actions, so we can shift the limit in, to get

Now, as  approaches , the observation returned for the first action stabilizes, so we get

And then, due to continuity of , and  converging to  because  converges to , we have

Which then packs up in reverse as

And continuity has been shown. This inducts up to show continuity for .

T5.1.4.2 1-Lipschitzness holds because  is the semidirect product of a crisp infradistribution () (and all crisp infradistributions are 1-Lipschitz) with a crisp infrakernel (also 1-Lipschitz), and the semidirect product of a 1-Lipschitz infrakernel and a 1-Lipschitz infradistribution is 1-Lipschitz.

T5.1.4.3 Compact-shared almost-compact-support is trivial because  is a compact space.

T5.1.4.4 For "constants increase", actually, crisp infrakernels map constants to the same constant. So we get a stronger result, that , and in particular,  (the property inducts up appropriately). This also has the implication that all the  are infradistributions, as they map 0 to 0 and 1 to 1. This also bags the last condition, 1-normalization. That's all niceness conditions, so is well-defined.

Now that we know it's well-defined at all, our goal is to show that

is an infradistribution. This is, again, easy. The semidirect product of an infradistribution and a continuous infrakernel that returns infradistributions is an infradistribution, and the projection of an infradistribution is an infradistribution.

T5.1.4.5 With that, all that remains is showing that , defined as 

fulfills the causality property. We can rewrite the update and projection to get

and rewrite the semidirect product

and then write this as a projection,

and then write this as an update