# 3

Theorem 2: Acausal Commutative Square: The following diagram commutes for any belief function . Any infradistribution  where  also makes this diagram commute and induces a belief function.

Here's how the proof proceeds. In order to show that the full diagram commutes, we first select properties on the four corners such that for each morphism, if the source fulfills the relevant property, then the target fulfills the relevant property. Obviously, the property for the bottom-right corner must be "is a belief function". There are eight phases for this, some of which are quite easy, but the two most difficult are at the start, corresponding to the two morphisms on the bottom side, because we must exhaustively verify the belief function conditions and infradistribution conditions. As usual, lower-semicontinuity is the trickiest part to show. Then, since we have four sides of the square to show are isomorphisms, we have another eight phases where we verify going forward to a different corner and back is identity. Finally, there's an easy step where we rule out that going around  in a loop produces a nontrivial automorphism, by showing that both paths from first-person static to third-person dynamic are equal.

First, let's define our properties.

The property for third-person static is , and  is an infradistribution.

The property for third-person dynamic is , and , and  is an infradistribution.

The property for first-person static is that  is a belief function.

The property for first-person dynamic is , and  when , and  when , and  is an infradistribution.

T2.1 Our first step is showing that all 8 morphisms induce the relevant property in the target if the source has the property.

T2.1.1 3-static to 1-static. Our definition is

and we must check the five conditions of a belief function for , assuming  is an infradistribution on  and it has the property

T2.1.1.1 First, bounded Lipschitz constant. Our proof goal is

Let  be the Lipschitz constant of , which is finite because  is an infradistribution. Then let  be arbitrary (they must be bounded-continuous, though). We have

and then unpack the projection to get

and unpack the update to get

and then by Lipschitzness for ,

Now, when , the two functions are equal, so we have

We'll leave lower-semicontinuity for the end, it's hard.

T2.1.1.2 Now, normalization, ie  To do this, we have

Then this can be rewritten as

and then as a semidirect product.

and then, from our defining property for , and normalization for  since it's an infradistribution, we have

The same proof works for 0 as well.

T2.1.1.3 Now, sensible supports. Pick a  where  and  are identical on histories compatible with . We want to show

To start,

and undoing the projection and update, we have

Remember that actually,  is supported over , ie, the subset of  where the destiny is compatible with the policy, so when  and so  and  behave identically. Making this substitution in, we have

which then packs up as  and we're done.

T2.1.1.4 Now for agreement on max value. For the  type signature on inframeasures, we want to show

Let  be arbitrary. We have

and then the projection and update unpack to

and then, working in reverse from there, we get  and we're done. For the  type signature, we want to show

Let  be arbitrary. We have

which then unpacks as

and then, working in reverse, we get  and we're done.

T2.1.1.5 Time to revisit lower-semicontinuity. We want to show that

This is going to take some rather intricate work with Propositions 2 and 3, and Theorem 1 from Inframesures and Domain Theory. Namely, the supremum of continuous functions is lower-semicontinuous, any lower-semicontinuous lower-bounded function has an ascending sequence of lower-bounded continuous functions which limits to it pointwise, and for any ascending sequence which limits to a lower-semicontinuous function pointwise, you can shuffle the limit outside of an inframeasure. Fix some continuous bounded , either in  or in , and sequence of policies  which converge to .

Our first task is to establish that the subset of  where  or  or ... or  is closed. This can be viewed as the following subset of :

is compact, and  is compact as well, since it limits to a point and the limit point is included. So that product is a compact set, and must be closed. The latter set is  itself. And so, since this set is the intersection of a closed set with , it's closed in  with the subspace topology.

Now, define the functions  as follows. If  has  or  or  or... or , then return . Otherwise, return 1 (or  for the other type signature).

is the function where, if , return , otherwise return 1 (or ).

Our first task is to show that all the , and  are lower-semicontinuous. This is pretty easy to do. They are 1 (or ) outside of a closed set, and less than that inside the closed set.

Lower-semicontinuity happens because we have one of three cases holding. In the first case, the limit point is outside of the closed set (in the complement open set). Then, at some finite stage and forever afterwards, the sequence is outside closed set of interest, and the sequence becomes just a constant value of 1 (or ), and so liminf equals the value of the limit.

In the second case, the limit point is inside the closed set, and there's a subsequence of policy-tagged destinies which remains within the closed set of interest. The liminf of the original sequence of policy-tagged destinies evaluated by  equals the liminf of the subsequence which stays in the "safe" closed set, because all other points in the sequence get maximal value according to  (because they're outside the closed set), so they're irrelevant for evaluating the liminf. Then, inside the closed set,  acts like , which is continuous, so the liminf of  inside the closed set equals the limit of  inside the closed set, which equals  applied to the limit point (in the closed set)

In the third case, the limit point is inside the closed set, but there's no subsequence of policy-tagged destinies which remains within the closed set of interest. So, after some finite stage, the value of  is 1 or infinity, so the liminf is high and the value of the limit point is low, and again we get lower-semicontinuity.

Finally, we need to show that  is an ascending sequence of functions which has  as the pointwise limit. The "ascending sequence" part is easy, because as m increases,  maps a smaller and smaller set (fewer policies) to the value  reports, and more points to 1 (or infinity).

If  has , then regardless of n, , so that works out. If  has , then the sequence  can't hit  infinitely often, otherwise , so after some finite n,  drops out of the closed set associated with , and we have .

Just one more piece of setup. For each , since it's lower-semicontinuous, we can apply our Proposition 3 from the domain theory post to construct an ascending sequence of bounded continuous functions  where  pointwise.

NOW we can begin showing lower-semicontinuity of .

by how  was defined. And then we unpack the projection and update to get

First, we observe that the function inside  is greater than , because  copies  when  or  or ... and is maximal otherwise, while the interior function just copies  when  and is maximal otherwise. So, by monotonicity, we can go

And then, we can use monotonicity again to get

Wait, what's this? Well, if , then , because  was constructed to be an ascending sequence of functions (in n) limiting to . And  is an ascending sequence of functions (in m), so it's below . Thus, the function  lies below , we just applied monotonicity.

Our next claim is that  is an ascending sequence of continuous functions which limits pointwise to . Here's how to show it. The functions are continuous because they're all the supremum of finitely many continuous functions . They ascend because

The first inequality was from considering one less function in the sup. the second inequality is because, for all m,  is an ascending sequence of functions in n.

For showing that the sequence  limits (in n) pointwise to , we fix an arbitrary  and do

The inequality is because eventually n goes past the fixed number , and then we're selecting a single function out of the supremum of functions. The equality is because  limits pointwise in  to . Now, with this, we can go:

Let's break that down. The first equality was because we showed that  is the pointwise limit of the . The second equality is because  makes an ascending sequence of functions, so the supremum of the functions would just be  itself. The first inequality is because , it's one of the approximating continuous functions. The second inequality was the thing we showed earlier, about how the limit exceeds  regardless of . Finally, we use again that  is the pointwise limit of the .

We just showed a quantity is greater than or equal to itself, so all these inequalities must be equalities, and we have

So, the ascending sequence of continuous functions  limits pointwise to . Now that we have that, let's recap what we've got so far.We have

Since  is an ascending sequence of continuous functions in , then by monotonicity of , that sequence must be increasing, so we have

and then, we can apply the monotone convergence theorem for infradistributions, that a sequence of continuous functions ascending pointwise to a lower-semicontinuous function has the limit of the infradistribution expectations equaling the expectation of the lower-semicontinuous supremum, so we have

and then we pack this back up as an update and projection

and back up to get

and lower-semicontinuity of  has been shown. So translating over an infradistribution makes a belief function.

T2.1.2 1-static to 3-static.

Now, we'll show the five infradistribution conditions on , along with the sixth condition on  that it equals .

First, to show it's even well-defined, we have

That inner function is lower-semicontinuous in , because, starting with

then we have that, because  is continuous and  is compact,  is uniformly continuous. So, as n increases, the function  uniformly limits to . Since all the  have a uniform upper bound on the Lipschitz constant (from  being a belief function) they converge to agreeing that  has similar value as , so we get

and then, by lower-semicontinuity for ,

So, since the inner function is lower-semicontinuous, it can be evaluated.

T2.1.2.1 Now for the condition that

And unpack the definition of semidirect product and what  means, to get

This step happens because, if , then  will be assessing the value of the constant-1 function (or constant-infinity function), and will return a maximal value (by condition 5 on belief functions, that they all agree on the value of 1 (or )). So the only way to minimize that line via a choice of  is to make it equal to , as that means that we're evaluating  instead, which may be less than the maximal value.

Now we pack up what  means, and the semidirect product, to get

Pack up the definition of  to get

Pack up the definition of "1-update on " to get

Pack up how projection mappings work, to get:

Pack up what  means and the semidirect product to get

So we have

as desired, since  was arbitrary.

T2.1.2.2 Now to verify the infradistribution conditions on . First, monotonicity. Let

And unpack the definition of semidirect product and what  means, to get

Then, by monotonicity for all the , we have

which packs back up in reverse as .

T2.1.2.3 Now for concavity. Start with  and unpack in the usual way to get

Then, by concavity for the , as they're inframeasures, we have

and distribute the inf

Pulling the constants out, we have

and packing back up in the usual way, we have

Concavity is shown.

T2.1.2.4 For Lipschitzness, we start with  and then partially unpack in the usual way to get

And, since  is a sharp infradistribution, it has a Lipschitz constant of 1, so we have

Since all the have a uniform bound on their Lipschitz constants (one of the belief function conditions), we then have

which rearranges as

and we're done, we got a Lipschitz constant.

T2.1.2.5 The CAS property for  is trival because  is a closed subset of  which is a compact space, and so you can just take all of  as a compact support.

T2.1.2.6 That just leaves normalization. We have

by unpacking  and applying normalization for a belief function.

T2.1.3 For 3-static to 3-dynamic, we have

(which fulfills one of the 3-dynamic conditions), and

Since  is an infradistribution,  is too, and for the condition on , we have:

and we're done, since  had the relevant condition.

T2.1.4 For 3-dynamic to 3-static, we have

First, we must show that the niceness conditions are fulfilled for the infinite semidirect product to be defined at all.  maps  to . This is clearly a continuous mapping, so in particular, it's lower-semicontinuous. Dirac-delta distributions are 1-Lipschitz when interpreted as inframeasures (all probability distributions have that property, actually). The compact-shared-CAS condition is redundant because  is already a compact space. And all probability distributions (dirac-delta distributions are probability distributions) map constants to the same constant, so we get the increase-in-constants property and the 1 maps to 1 property. So we can indeed take the infinite semidirect product of the .

Now, we'll show that unrolling like this just produces  exactly. We'll use  for the initial state (pair of policy and destiny, also written as ), and  for the successive unrolled history of states, actions, and observations. First, by unpacking 's definition,

Then we unpack the projection and semidirect product

And pack things up into a projection

And then we can observe that unrolling the initial state forever via "advance the destiny 1 step, popping actions and observations off, and advance the policy", when you project down to just the actions and observations (not the intermediate states), yields just the point distribution on that destiny, so we get:

And then substitute the value of that dirac-delta in to get

So, our . Now we can go

and we're done, showing  had the relevant condition from  having it. Since  is an infradistribution,  is too.

T2.1.5 For 3-dynamic to 1-dynamic, we have

For the action  (which is compatible with the start of the history), we have

Applying our definition, we have

Unpacking the projection, we have (where  is a state)

Unpacking the update, we have

And then we consider that  is just the dirac-delta distribution on , so we have

Substitute the values in, and we have

and then pack up the dirac-delta and we get

So,  is just the dirac-delta distribution on  and the pair of  and the policy advanced one step, as it should be. For  that aren't  as it should be, we have

Applying our definition, we have

Unpacking the projection, we have (where  is a state)

Unpacking the update, we have

And then we consider that  is just the dirac-delta distribution on , so we have

Substitute the values in, and remembering that  and  are assumed to be different, and we have (regardless of )

Or infinity for the other type signature. Therefore,

So our transition kernel works out. For the other conditions on , just observe that , and use the extreme similarity of their defining conditions.

T2.1.6 For 1-dynamic to 3-dynamic, we have  so we can use our defining property of  again, clearly show that  has the relevant defining properties, so that just leaves cleaning up the defining property for the  infrakernel. We remember that

Therefore, we have:

This occurs because the history must be compatible with the policy. Then, we can unpack as:

We unpack our , getting

Then, we observe that for any . So,  must be , and we have

Now,  is the dirac-delta distribution on , so making that substitution in, we have

Ie

And this holds for all , so we get our desired result that

T2.1.7 For 1-static to 1-dynamic, we just have

So this automatically makes the infrakernel have the desired properties. For showing the relevant properties of , we just copy the proof at the start of getting the properties for  from  being a belief function, since  and  are both defined in the same way.

T2.1.8 For 1-dynamic to 1-static, we have

We'll work on getting that latter quantity into a better form, but first we have to verify that
is even well-defined at all, w need the  to fulfill the niceness conditions. They're defined as

For lower-semicontinuity,  is lower-semicontinuous, because it returns  on every action but one (this is a clopen set), and for the action that pairs up with the initial state, the policy-tagged history just has the history and policy advanced, which is continuous.  behaves similarly, continuous variation in input leads to continuous variation in output, because eventually the behavior of the policy settles down to a fixed action as the history of actions and observations stabilizes amongst one of the finitely many options.

For making inframeasures,  always returns a dirac-delta distribution or , so we're good there. Similarly, for 1-Lipschitzness, both  and dirac-delta distributions are 1-Lipschitz. For compact-shared CAS, as the target space is , it's compact and you don't need to worry about that. Finally, both dirac-delta distributions and  map constants to either the same constant or higher, and in the  type signature, both  and dirac-delta distributions map 1 to 1. So all niceness conditions are fulfilled.

Let's work on unpacking the value

unpacking the projection and semidirect product, we get

And unpacking the update, we have

We can make a substitution which changes nothing (in the scope of the indicator function where ),  that

and then we can write this as a projection in the inside

And then we can realize that when we unroll , it just always deterministically unrolls the history, along with the policy, since for  to be an input state,  must be compatible with , so picking actions from  means there's never any action mismatches. Projecting this down to the actions and observations just makes the history  exactly. So we have

Substituting the dirac-delta value in, we have

And we pack the update and projection back up to get

Since  was arbitrary, we have

And so, an alternate way to define  is as

We can run through the same proof that  is indeed a belief function, back from the 3-static to 1-static case, because  fulfills all the same properties that  did.

Alright, now we have, for all four corners, a condition that's basically "came from an acausal belief function" that is preserved under morphisms. Now we need to show that all 8 back-and-forth directions are identity.

T2.2.1 For 1-static to 3-static back to 1-static, we want to show

This is (we assume the hard version first)

Then unpack the projection and update

Then unpack the semidirect product and  to get

Then realize that the minimizer is picking  exactly, otherwise you'd just get maximal value and all the  agree on what a maximal input maps to.

And we're done.

T.2.2.2 For 3-static to 1-static to 3-static being identity, we want to show

This is just exactly the condition we're assuming on , so we trivially fulfill this (the nontrivial part was shoved into showing that going belief-function to infradistribution over policy-tagged states produced this property).

T2.2.3 For 3-static to 3-dynamic back to 3-static, we need to show that

For this, we abbreviate states beyond the first one as , so  is the initial state, and  is an element of . Taking the complicated part, it's

And then we unpack the semidirect product a bit, to get

And then we can write this as a projection, to get

Now, we observe that because , when repeatedly unrolled, is just emitting the actions and observations from the starting destiny/history, this projection is just the point distribution on the action-observation sequence that is h.

Then we evaluate the expectation, yielding