Personal Blog

# 0

The final lemma in this sequence will be necessary for use in an upcoming post about logical inductors and correlated equilibria, so it will be proved here in order to not clutter up that post.

Much of the notation in this is reused from the previous post, Two Notions of Best Response

Let be the set of actions available to player 1. . Given a matrix , refers to the -th row vector of . Given some vector , is defined in the usual way. In this post, is typically used to refer to some default action by player 1, and typically refers to some better action.

Consider the space . This would be a space of probabilities/prices assigned to the various joint outcomes, by some logical inductor. It's possible to use expressible features to pick out fuzzy subsets of this space, by taking the prices of the various sentences as input and mapping them to (in practice, the prices will always be in a hypercube, but it'll be convenient to not have to deal with the special case where the prices are on an edge of the hypercube)

The infinite sequence of points that lie in this space can be identified with the sequence of prices of the logical inductor on day .

Consider some fuzzy set/-generable region in that space, bounded in [0,1], which we'll denote by .

Let be the number in [0,1] that the expressible feature which implements outputs when given the market prices of the day . By a slight abuse of notation, refers to what the expressible feature would output if the market prices were given by the point .

iff there's a point such that . In words, there's a point in where is an evidential best response.

Given some point , where some action has nonzero probability (for a well-defined expected utility), then an action dominates by if .

#Lemma 1:

When a point has , all points in an -ball of size around where will have . (approximately)

Proof:

(here, should be interpreted to be any number in , and it doesn't necessarily have to be the same number on the top and bottom. Also, in the next line, will be used as an abbreviation for , which is a number in )

QED.

Now let be the n'th largest utility in the -th row of the utility matrix. refers to the highest utility less than the maximum utility, in the event that there are two actions with maximal utility. is defined similarly.

#Lemma 2:

For all points , there exists a change in probabilities by (according to the -norm), for which the expected utility of the move can be increased by , or pushed to the highest attainable utility, whichever one is lower.

The same argument will also work to establish that the probabilities can be lowered by , or pushed to the lowest attainable utility, whichever one is higher.

Assume that the probability matrix has Then there is an arbitrarily small change in probability that will have have the highest possible expected utility, and it is proved.

Assume that the probability matrix has have less than probability, total, on the which correspond to non-maximal utilities in . Then there is an -change in probabilities (take all the probability mass from non-optimal outcomes and put it on the best outcome) that makes have the highest possible expected utility, and it is proved.

Assume that the probability matrix has have or greater probability, total, on the which correspond to non-maximal utilities in . Then by taking probability mass from the non-optimal outcomes, and putting it on the optimal outcome, it is easy to verify from the expected utility equation that it will produce at least a increase in the expected utility.

#Lemma 3:

If there is a point for which no action dominates by a positive number, and there is a -generable fuzzy set , then must be on the boundary of , or on the exterior of .

Assume the opposite, that this point is on the interior of . Due to the continuity of the fuzzy set, and the fact that is not dominated by any positive number, then there is an arbitrarily small perturbation of the prices to make an evidential best response. Now there is a point where is an evidential best response in the support of . However, we assumed that the support of had no points where was an evidential best response. Therefore, there is a contradiction, and must either be on the boundary of , or the exterior of .

In both cases, due to continuity, this point must have .

#Lemma 4:

Given a -generable fuzzy set such that , and some , then let . Then all points in have some action that dominates by , for some positive constant .

To begin with, the function has some bounded lipschitz constant . Let .

Given a point , associate each action with for that action, and select an arbitrary highest-value action to be .

Consider the function that uses the highest-value action for as , and then outputs when both of these terms are positive, and the term that is positive when the other term is 0. (we can ignore the case where both terms are 0, as we shall soon see)

To begin, assume that both terms are 0. That means that and are both constants. If , then the theorem is proved. If they are equal, then one of two situations apply. In case 1, is never selected as one of the best actions, in which case we can ignore it. In case 2, is selected as one of the best actions somewhere in , and then no action would dominate by a positive number in the interior of , which is impossible by Lemma 3.

Therefore, assuming the theorem hasn't been trivially proved yet, is always defined for all . Also, note that is bounded below by a constant (because there are only finitely many actions, all of which have their corresponding term be a constant, or 0)

Select an arbitrary point , and the corresponding action, such that

If there isn't any, we're done (and .

Otherwise, by Lemma 2, if we look at the ball of -sized perturbations, one of two events will have occurred. In case 1, there's a point in that ball where the "domination gap" given by has been closed completely. In case 2, the maximum possible utility for is less than the minimum possible utility for , and the ball of size has been ruled out, as all points in that ball have dominating by a fixed constant .

In case 1, because the point where the utility gap closes is -away from a point where , and , then (due to the bounded lipschitz constant of ) Then there's a point where is not dominated on the interior of , which by Lemma 3, is impossible.

Therefore, case 2 must apply. However, because the worst-case utility of action dominates the best-case utility of action by , our assumption that the theorem was false is false.

#Lemma 5:

Given some point where dominates by or more, and , and is more than from a point that is in (according to the norm), then the sequence of prices will only finitely often be in the -ball of size ~ around .

We will consider two cases, and prove them seperately.

In case 1, .

By Lemma 1, the change in expected utility for and over the ball will be about, or less than, . Because dominates by or more at , then in this ball, always dominates by or more.

Now, there is a convex -generable fuzzy set which covers this ball (and a tiny region outside of it, due to the continuous indicator functions, but this doesn't affect anything, because the support of that region still has dominating by about or more). Assuming that there are infinitely many points in the ball (by contradiction), then by calibration, the empirical joint probabilities, summed over all time, will be within th support of .

To recap the definition of expected value for a logical inductor,

By assumption, has a positive probability around or greater. By coherence, the logical inductor will eventually learn that the price on should equal the sum of the prices over all the other joint outcomes, so in the limit, . A similar argument applies to learning the utilities of the various outcomes in the top of the fraction, so in the limit, .

Now, since also has probability bounded away from 0, then the same argument applies to it as well. The expected utilities will eventually be learned. Because there's a or larger difference in expected utilities, will be taken with limiting probability 0 when the prices are inside . However, due to calibration, the probability of must be bounded away from 0, so we have a contradiction, and the sequences of prices must have only finitely many elements in the -ball.

Now for case 2. Assume case 1 is false, so for the point , for all which dominate by or more, .

Consider the -ball of size around . There degree of improvement in over the whole ball can be at most , by Lemma 1. Let be the maximum increase over for , for the whole ball.

For each action that is not and has , there is a perturbation of size or less that will either drive their expected utility to or below, or drive their expected utility to the minimum possible.

If has a probability less than (as all the that -dominate do), by taking utility mass from that action (to drive it to probability 0) and putting it on in a way that preserves expected value, that will be counted as having the lowest possible expected utility in the dominance calculation. If has a probability (and has an expected utility within of ) Lemma 2 works to either drive the expected utility of below that of , or drive it to the minimum possible expected utility.

If the worst-case utility for all those actions is equal to or less than , then there's a point in that's within distance of (apply the perturbations mentioned above to make all other actions lose), which violates one of the starting assumptions. Therefore, there must be some action that dominates by more than at , using the worst-case utility, so it dominates by a fixed amount over the whole ball.

Now, we shouldn't necessarily assume that, over the whole ball, accurate utilities for will be learned, because the ball may have regions where the probability of is arbitrarily close to 0. However, density-zero exploration lets the inductor learn a lower bound on the utility of , as we'll see.

Assume that the -ball has infinitely many price points in it (for a contradiction) Then, we can again make a -generable convex set with infinitely many points in it, and because of infinite exploration on divergent -generable subsequences (this is a slight strengthening of Scott's density-zero exploration), the action will be taken infinitely often, and in finite time, the inductor will learn that will be close to, or greater than, the worst-case utility for when the prices are in the set .

Again, we get the logical inductor learning to never take action , which produces a contradiction with the calibration property in the same way as before, because in this region, the price of is bounded away from 0.

Therefore, in case 2, there are still only finitely many points in the ball around , and the theorem has been proved.

#Lemma 6:

One of the following two properties applies to all convex -generable sets of prices over joint outcomes:

1:

2:

(I suspect it's possible to strengthen condition 2 a little bit, but this more than suffices for the upcoming proof in the next post. Also, is the function that is 0 when is not taken on turn , and 1 otherwise.)

Assume 1 is false. If it's true, the proof is over. Also, assume there are infinitely many points in the interior of , because otherwise 2 is true and the proof is over. The space of prices is composed entirely of the following three disjoint sets of points. Set is composed of points where . in sets and , this doesn't hold (and by Lemma 4, there must always be some action that dominates by or more).

Set is convex, because is. Set is the (convex) set of points where , and set is the (convex) set of points where .

Set can have Lemma 5 applied to it to conclude that the sequence of prices will only be within it finitely often. Just identify with in Lemma 5 to get the bounding away from , because that's the minimum distance to travel to get outside of , and identify or something sufficiently smaller with the in Lemma 5 to get the domination condition.

We can split up into three sequences (with some overlap, but it won't matter, because it produces an overestimate)

In the first sequence, the prices for the day are in set . In the second (fuzzy) sequence,the expected utility of is less than (plus a little bit, so an expressible feature can capture this), and in particular, this fuzzy sequence is -generable. In the third sequence, the prices for the day are in set . All days are in one of these three sequences.

Over the first sequence, the time-averaged sum must limit to or less, because , always.

Over the second (fuzzy) sequence, the long-term frequency of being taken must limit to or less, by calibration, so the time-averaged sum must limit to or less.

Over the third sequence, because it has finitely many points in it, the time-averaged sum limits to 0.

Therefore, the time-averaged sum must limit to or less. Because is a fixed constant, and can be as small as we want, the time-averaged sum must converge to 0.

Personal Blog
New Comment