Infra-Bayesian physicalism: proofs part II

by Vanessa Kosoy34 min read30th Nov 2021No comments

5

Infra-BayesianismWorld Modeling
Frontpage

This post is an appendix to "Infra-Bayesian physicalism: a formal theory of naturalized induction".

Proposition 2.16: Consider some , , and . Define by . Then,

We can prove the second subset inclusion directly as a corollary of Proposition 2.10, just let the of Proposition 2.10 be the projection function , so that just leaves the first subset inclusion direction. If you've seen the proofs so far, you know we do a thing where we try to show subset inclusion with expectations of functions and inequalities instead. And that the proofs all proceed by transforming the expectations until we get a maximum over contribution expectation values, and that's always where the hard part of proving the inequalities shows up. So, let's just get that part over with, an interested reader can work it out with previous proofs as a guide. Unusually, we'll be keeping track of identity functions here.

Plugging in some , and doing our usual activities to get every term into the appropriate form, we can get this result if we manage to show that So, to establish this, we'll show that, given some , we have , and that because, if we show that, then it means that is a rich enough set for the right-hand side of the equation to match anything the left-hand-side can put out.

First off, is pretty trivial to show. The only difference between the two processes is that the coordinate of is discarded immediately on the right-hand-side, and it's preserved for one step and then discarded on the second step for the left-hand-side.

Now for our inequality of interest. Let , and we're trying to show that First off, showing the support condition for which is somewhat nontrivial this time around. We start off with a guarantee that . This happens iff And so, we get that is guaranteed for that pushforward, support condition established.

endofunction condition time. It's important to remember that we want to treat as the computation side of things, and as the environment side of things, for our bridge transform we're working with. and . Begin. Let's unpack precisely what that set is. And we can rewrite the endofunction a little bit And finally apply our endofunction condition, since we've now got the function in a form that's treating as part of the computational universe... And we're done, this establishes our desired result.

Proposition 2.17: is a continuous function of .

The way this proof will work is by describing a composition of functions that makes from , and then showing that each of these functions is continuous, if is a finite set.

Claim: The bridge transform of some is equal to (using to denote restricting an ultradistribution to the event and to denote the inverse of said function, mapping an ultradistribution on to the largest ultradistribution that could have produced it via restriction) Breaking down the unfamilar notation, the type of is , just the usual projection. That asterisk up top is pullback along that function. The type of is . And is pullback along the function given by .

Let's unpack the exact conditions that cause a to lie in the set First off, a is in this set iff it is supported over the event , and it lies in the set Which occurs iff is supported over the event , and for all , lies in the set Which occurs iff is suported over the event , and for all , lies in the set Which occurs iff is supported over the event , and for all , lies in the set

Now, is just doing a little bit of type conversion, so we're justified in ignoring it. Anways, the previous thing occurs iff is supported over the event , and for all , .

Which happens iff is supported over the event and for all and , However, unpacking the left-hand side, we get Which is the exact condition for to lie in the bridge transform. So, we have an equivalence.

Now, since we've phrased the bridge transform as We just need to establish that when all the sets are finite, then pullbacks are continuous, pushforwards are continuous, un-restrictions are continuous, intersections are continuous, and restrictions are continuous. Then, this would just be a particularly fancy continuous function, and accordingly, if limited to , then would limit to .

Let's establish that when the sets are finite, pullbacks are continuous. Let , and and be finite sets, and . Then, we have With the convention that maximizing over the empty set produces a value of 0. That is an alternate phrasing of pullback. We can then go Admittedly, this isn't quite what our usual modified KR metric usually looks like. The reason we can do this is because we're just dealing with functions in , so the norm part of the modified KR metric doesn't matter, and since our sets are finite, we can say that all points are distance 1 from each other, so all functions are 1-Lipschitz, and then the two metrics coincide. So, pullback along any function is continuous.

For pushforward, it's easy because, if , then we've got For showing restrictions continuous, for the set that we're updating on, For intersections... that will take a bit more work. We'll have to use the equivalent formulation of closeness, that limits to iff the Hausdorff distance between the corresponding sets (according to the generalized KR measure) limits to 0. So, our task is to assume that limits to , and limits to , and show that limits to . The bound we'll manage to prove is that Where is the number of elements in the finite set . Here's the basic argument. For any particular point in the set , there's a nearby point in (since the Hausdorff distance is low) with only measure moved around or deleted. So, in particular, if all the measure moved or deleted was just deleted from instead, then that resulting contribution would be below the nearby contribution in that we picked, and so it would lie in as well due to downwards closure.

So, in particular, if and only have a Hausdorff distance of , then, taking any contribution in and subtracting measure from \emph{all points} (if possible, if not, just remove measure till you're at 0) is \emph{guaranteed} to make a point in , and vice-versa.

And a corollary of that is that, given any contribution in , the "subtract measure from each point" contribution is in , also in , and at a maximum distance of from the original contribution. And this argument can be reversed to show that the limit of the intersections is the intersection of the limits (because hausdorff distance between the two goes to 0), so we do in fact have intersection being continuous.

And that just leaves un-restricting. Again, this will take a Hausdorff-distance argument. Fixing some contribution in , it can be broken down into an on-E part , and an off-E part . When you restrict to E, then . Since is within of , there's some that's within of . Then, let your point in be (if there's slightly more than 1 measure there, delete measure from , or all the measure if there's less than present). It's close to because is close to , the other component of it is unchanged, and maybe we deleted a little bit of excess measure which didn't do much.

This line of argument shows that being close to the limit is sufficient to establish that the un-restriction of the two of them are comparably close together. So we have continuity for that, which is the last thing we needed.

Since we wrote the bridge transform as a sequence of continuous functions, we know it's continuous (as long as all the involved sets are finite)

Proposition 3.1: Let be a finite poset, and downward closed. Define by . Observe that is always non-decreasing. Then, .

Proof: Pick a s.t. . Ie, a maximizing contribution. Let be defined as . Ie, it moves a point down to somewhere below it where it can attain the highest value according to . Now, consider . It's present in because was, by assumption, downwards closed, and we just moved all the measure down.

Now, we have And so, all inequalities must be equalities, proving that . In order, the connectives were: unpacking definitions, using downward closure to conclude that , unpacking pushforwards, unpacking the definition of , using that applying a function to the argmax of inputs to that function just makes the max of the function, folding the definition of back up, using that was selected to maximize , and applying monotonicity. Done!

Proposition 4.1: Consider some , , a relation and a PUCK over . Let . Then,

First off, I'm not terribly picky about variable ordering, so I'll just write our desired proof target as The way we'll do this is by establishing the following result. For all monotone , we have Why does that suffice? Well, assume hypothetically that the result held. Since the inequalities go in a circle, we have equality for all monotone functions. And then, for some non-monotone function , we can go and swap out for to show the other equality, and then we'd have equality of the three ultradistributions on all functions, so they're equal.

For the equalities in the above equation, the first one arose because of Proposition 2.4 (bridge transforms are always downwards closed) and Proposition 3.1 (downwards-closed things let you swap out for and it doesn't affect the value). The second equality arose because is a monotone function and by assumption, we have equality for monotone functions. The third equality would arise because taking the downwards closure doesn't affect the expectation value of monotone functions. If you add a bunch of contributions made by measure flowing down, that's just strictly worse from the perspective of a monotone function and doesn't change expectation value. And the fourth equality arises from Proposition 3.1 again.

So, we just need to prove the following three inequalities, for monotone functions . The first one is easily addressable by Proposition 2.7. By proposition 2.7 and the definition of , we have And so, for monotone functions , we have Done.

Now to show our second inequality. Unpack the definition of the set Unpack the definition of The condition is equivalent to . After all, if , the distribution lies in , so would certify that . And if , then no matter the distribution in or kernel selected from , if gets picked, then the kernel selected from isn't going to be making along with it. Since we have that iff characterization, we have is the union of a bunch of for (and convex hull), so its support is equal to the union of the supports for the . Then, since each is a PoCK over , is the restriction of some measure to the set , which will be written as . And now we're about to get an inequality. is monotone, so making the associated set bigger (easier to fulfill the defining condition) should always increase the value of , and by monotonicity, increase the expectation value, so we get Then restate And pack back up as a semidirect product. And we have our second inequality established!

Now, onto the third inequality. Unpack the semidirect products And what top means And as for ... well, each is the convex hull of the various , for . So, the expectation for is the maximum expectation for the various , so we can rewrite as Pick a particular and that attain the maximal value Reexpress a little bit And pack this back up as a semidirect product And then we'll be showing that this contribution lies in . Once we've done that, we can go And we'd be done, having proven the third inequality and the last one to finish up the proof. So, now our proof target switches to showing that . We can show this if we show the support condition and the endofunction condition.

For the support condition, we have And then we use that the are all of the form "take this measure, restrict it to ", to get Unpacking the definitions, we get And so, this contribution is indeed supported on pairs s.t. .

Now for the endofunction condition. As usual, fix an and a . Unpack the semidirect product Plug in the dirac-deltas Reexpress the set membership criterion a bit And the contribution at the start Distribute it in as an indicator function. Pull the other indicator function out. Rewrite with Use an inequality to get rid of the indicator function Rewrite it a bit Swap out for , the latter is larger Swap out for , the latter is larger Abbreviate And bam, endofunction condition is shown, the entire proof goes through now.

Corollary 4.3: Suppose that for any and s.t. , it holds that . That is, the observations predicts to receive from the computer are consistent with the chosen policy. Let be a Cartesian loss function and a policy. Then,

I'm going to be proceeding very cautiously here. First off, make our value visually distinct by swapping it out for Now, by the identifications we made earlier, we can identify with , the space of policies. Using that to unpack the function a little bit, we have Now, we note that intersecting with top of a particular set is equivalent to updating on the indicator function for that set. Using definition 1.5 to unpack , we get Apply that is "what would the agent do on if the agent is copying the behavior of ", so we can rephrase as: Pull off the projection, and use for a destiny in . At this point, we use that , and that is a PUCK over and Proposition 4.1 to go Before we can remove the downwards closure, we'll want to verify the function is monotone. So, we'll want to start unpacking the physicalist loss next. Applying definition 3.1, and using instead of to remember it's a destiny, we have Next up is unpacking . Using definition 3.1, it's At this point, we can, again, treat the same as . And now we need to take a moment to show that gets smaller when gets larger. Applying definition 1.5, the event unpacks as Now, if becomes a larger set, then it gets harder for the first condition to be fulfilled, so the set shrinks. Now, since this happens, it means that if gets bigger, it gets more difficult for the prerequisite of the implication in the indicator function to be fulfilled, so the implication is more likely to hold. Further, the minimization is taking place over a smaller set, so the loss goes up. So our function is monotone in , and we can remove the downwards closure. Unpacking the semidirect product, it is Substituting in the dirac-delta everywhere that is, we get Now, is the set of policies s.t. . The "this policy is consistent with this destiny" relation. Also let's swap out for maximization Now, we're going to try to address that minimum, and show that the only that fulfill the conditions are exactly those . This requires showing that is a sufficient condition to fulfill the relevant properties, and then to show that implies a failure of one of the properties.

So, first up. Assume . Then, for any , and \emph{must} imply that , that's what policy consistency means. Also, unpacks as the two conditions As for the first condition,clearly, if is consistent with , it's consistent with because , and so it must be consistent with any prefix of , so the first condition holds.

For the second condition, is a valid choice, because we assumed , and occurs always, because always being supported on s.t. was one of our problem assumptions.

So, we have one implication direction down. Now for the reverse implication direction. Assume . Then there are two possibilities. the first possibility is that first diverges from on an observation. The second possibility is that first diverges from on an action.

For the first possibility, it's possible to make two policies which are consistent with but also differ in their actions on history , because isn't a prefix of if first differs from on an observation.

For the second possibility, it's ruled out by either the condition for that goes or the extra condition that