We consider transparent games between bounded CDT agents ("transparent" meaning each player has a model of the other players). The agents compute the expected utility of each possible action by executing an optimal predictor of a causal counterfactual, i.e. an optimal predictor for a function that evaluates the other players and computes the utility for the selected action. Since the agents simultaneously attempt to predict each other, the optimal predictors form an optimal predictor system for the reflective system comprised by the causal counterfactuals of all agents. We show that for strict maximizers, the resulting outcome is a bounded analogue of an approximate Nash equilibrium, i.e. a strategy which is an optimal response within certain resource constraints up to an asymptotically small error. For "thermalizers" (agents that choose an action with probability proportional to 2uT), we get a similar result with expected utility Es[u] replaced by "free utility" Es[u]+TH(s). Thus, such optimal predictor systems behave like bounded counterparts of reflective oracles.
Preliminaries
The proofs for this section are given in Appendix A.
We redefine E2(ll,ϕ) and E2(ll) to be somewhat smaller protoerror spaces which nevertheless yield the same existence theorems as before. This is thanks to Lemma A.1.
Construction 1
Given ϕ∈Φ, denote E2(ll,ϕ) the set of bounded functions δ:N2→R≥0 s.t.
∀ψ∈Φ:ψ≤ϕ⟹Eλkψ[δ(k,j)]=O(ψ(k)−1)
Denote
E2(ll):=⋂ϕ∈ΦE2(ll,ϕ)
Proposition 1
If ϕ∈Φ is s.t. ∃n:liminfk→∞2−knϕ(k)=0, O(ϕ−1) is a protoerror space.
For any ϕ∈Φ, E2(ll,ϕ) is an ample protoerror space. When ϕ is nondecreasing, E2(ll,ϕ) is stable.
E2(ll) is a stable ample protoerror space.
Notation
We denote O(ϕ−1∞):=O(ϕ−1)1∞. We allow the same abuse of notation for this symbol as for usual big O notation.
For any (poly,rlog)bischeme ^Q=(Q,rQ,σQ) we use ^σkjQ to denote UrQ(k,j)×σkjQ.
For reflective systems R=(Σ,f,μ) we write indices in Σ as superscripts rather than subscripts as before.
Given π∈ΠΣ, we denote U(π)nkj:=Uπnkj2 and πnkj(x,y):=β(ev(πnkj1;x,y)).
Results
The proofs for this section are given in Appendix B.
We start by showing that choosing an action by maximizing over optimally predicted utility is an optimal strategy in some sense.
Theorem 1
Fix a protoerror space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)optimal predictors for f. Assume that σPa and rPa don't depend on a (this doesn't limit generality since we can replace σPa by ∏b∈AσPb and rPa by a polynomial rP≥maxb∈ArPb). Consider ^A an Avalued (poly,rlog)bischeme. Define ^M another Avalued (poly,rlog)bischeme where ^Mkj(x) is computed by sampling y from ^σkjP and choosing arbitrarily from a∈A s.t. ^Pkja(x,y) is maximal. Then, there is δ∈E12 s.t.
Eμk×^σkjM[f^Mkj]≥Eμk×^σkjA[f^Akj]−δ(k,j)
Definition 1
Given suitable sets X,Y and ϕ∈Φ, a ϕ(poly,rlog)scheme of signature X→Y is ^A=(A:N×X×{0,1}∗2→Y,rA:N→N,σA:N×{0,1}∗→[0,1]) s.t.
(i) The runtime of Ak is bounded by p(tϕ(k)) for some polynomial p.
(ii) rA(k)≤p′(tϕ(k)) for some polynomial p′.
(iii) σkA is a probability measure on {0,1}∗ and there is c>0 s.t. suppσkA⊆{0,1}≤c⌊log(k+2)ϕ(k)⌋.
A ϕ(poly,rlog)scheme of signature {0,1}∗→Y is also called a Yvalued ϕ(poly,rlog)scheme.
As usual, we sometimes use ^Akj as implicit notation in which A receives a value sampled from UrA(k) for the second argument and/or a value sampled from σkA for the third argument.
Corollary 1
Fix ϕ∈Φ. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E2(ll,ϕ)(poly,rlog)optimal predictors for f. Consider ϵ∈(0,1) and ^A an Avalued ϕ1−ϵ(poly,rlog)scheme. Define ^M as in Theorem 1. Then
Eμk×λkϕ[E^σkjM[f^Mkj]]≥Eμk×^σkA[f^Ak]−O(ϕ−1∞)
We now introduce the game theoretical setting.
Definition 2
A distributional game is a quadruple G=(μ,P,A,u) where μ is a word ensemble, P is a finite set representing players, {An}n∈P are finite sets representing the possible actions of each player and {un:suppμ×∏n∈PAn→[0,1]}n∈P represent the utility functions.
For each x∈suppμ, G defines an ordinary game in normal form. We think of G as a single game where the strategies are (some class of) functions s:suppμ→An and the payoffs are Eμ[un(∏n∈Psn(x))].
Construction 2
Consider G=(μ,P,A,u) a distributional game and {ϕn∈Φ}n∈P. We construct the reflective system RϕG=(P,uϕ,μG).
μG is defined by
μnkG(k,x,a)=μk(x)χAn(a)#An
Define MnG:N2×suppμ×{0,1}∗×ΠP→℘(An) by
MnkjG(x,y,π):=argmaxa∈Aπnkj((k,x,a),y)
Denote ΔnG to be the space of mixed actions of player n i.e. the space of probability measures on An.
For any (poly,rlog)predictor ^Q we will use the notation snkG,ϕ(x,^Q)∈ΔnG to mean
snkG,ϕ(x,^Q):=EσQ[snkG,ϕ(x,^Q[z])]
For a family {^Qn}n∈P of (poly,rlog)predictors, we will use the notations skG,ϕ(x,^Q)∈∏n∈PΔnG and s¯nkG,ϕ(x,^Q)∈∏m∈P∖nΔmG to mean
skG(x,^Q):=∏n∈PsnkG,ϕ(x,^Qn)
s¯nkG(x,^Q):=∏m∈P∖nsmkG,ϕ(x,^Qm)
Corollary 2
Fix G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P and ^P an E∗2(ll,ϕ)(poly,rlog)optimal predictor system for RϕG. Consider n∈P, ϵ∈(0,1) and ^A an Anvalued (ϕn)1−ϵ(poly,rlog) scheme. Then
We now move from studying strict maximizers to studying thermalizers.
Theorem 2
Fix a protoerror space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)optimal predictors for f and T:N2→R>0 is some function. Assume that σPa and rPa don't depend on a. Consider ^A an Avalued (poly,rlog)bischeme. Suppose ^MT is another Avalued (poly,rlog)bischeme satisfying
Here H for stands the (base 2) Shannon entropy of a probability measure on A.
Corollary 3
Fix ϕ∈Φ. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E2(ll,ϕ)(poly,rlog)optimal predictors for f and T:N→R>0 a bounded function. Consider ϵ∈(0,1) and ^A an Avalued ϕ1−ϵ(poly,rlog)scheme. Assume ^MT is as in Theorem 2. Then
We define the notations snkG,ϕ,T(x,^Q), skG,ϕ,T(x,^Q) and s¯nkG,ϕ,T(x,^Q) analogously to before.
Corollary 4
Fix G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P, T:P×N→R>0 bounded and ^P an E∗2(ll,ϕ)(poly,rlog)optimal predictor system for Rϕ,TG. Consider n∈P, ϵ∈(0,1) and ^A an Anvalued (ϕn)1−ϵ(poly,rlog) scheme. Then
Finally, we show that assuming ϕn doesn't depend on n and T doesn't fall to zero too fast, deterministic advice is sufficient to construct an optimal predictor system for Rϕ,TG.
Definition 3
Consider a reflective system R=(Σ,f,μ) and {ϕn∈Φ}n∈Σ. R is said to be ϕHoelder when there are {cnk∈R>0}n∈Σ,k∈N, {ρn:Σ→[0,1]}n∈Σ, {ψnm∈Φ}n,m∈Σ, {αn∈(0,1]}n∈Σ and {δn:N→R≥0}n∈Σ s.t.
Mostly obvious modulo Proposition A.1. To see E2(ll,ϕ) is stable for nondecreasing ϕ, consider a nonconstant polynomial p:N→N and δ∈E2(ll,ϕ). Define δ′(k,j):=δ(p(k),j). To get the desired condition for δ′ and ψ∈Φ with ψ≤ϕ, consider any ψ′∈Φ s.t. ψ′≤ϕ and for sufficiently large k we have ψ′(p(k))=ψ(k). Suche ψ′ exists since for for k sufficiently large ϕ(k)≤ϕ(p(k)). We have
limsupk→∞ψ′(k)Eλkψ′[δ(k,j)]<∞
In particular
limsupk→∞ψ′(p(k))Eλp(k)ψ′[δ(p(k),j)]<∞
limsupk→∞ψ(k)Eλkψ[δ′(k,j)]<∞
Proposition A.2
Consider a polynomial q:N2→N. There is a function λq:N3→[0,1] s.t.
(i) ∀k,j∈N:∑i∈Nλq(k,j,i)=1
(ii) For any function ϵ:N2→[0,1] we have
ϵ(k,j)−∑i∈Nλq(k,j,i)ϵ(k,q(k,j)+i)∈E2(ll)
Proof of Proposition A.2
Given functions q1,q2:N2→N s.t. q1(k,j)≥q2(k,j) for k,j≫0, the proposition for q1 implies the proposition for q2 by setting
Consider (f,μ) a distributional estimation problem, ^P, ^Q(poly,rlog)predictors. Suppose p:N2→N a polynomial, ϕ∈Φ and δ∈E2(ll,ϕ) are s.t.
∀i,k,j∈N:E[(^Pk,p(k,j)+i−f)2]≤E[(^Qkj−f)2]+δ(k,j)
Then ∃δ′∈E2(ll,ϕ) s.t.
E[(^Pkj−f)2]≤E[(^Qkj−f)2]+δ′(k,j)
The proof of Lemma A.1 is analogous to before and we omit it.
Appendix B
The following is a slightly stronger version of one direction of the orthogonality lemma.
Lemma B.1
Consider (f,μ) a distributional estimation problem and ^P an E(poly,rlog)optimal predictor for (f,μ). Then there are c1,c2∈R and an E12moderate function δ:N5→[0,1] s.t. for any k,j,r,l,t∈N, τ a probability measure on {0,1}≤l and Q:suppμk×{0,1}rP(k,j)×{0,1}∗×{0,1}r×{0,1}≤lalg−→Q that runs in time t on any valid input
The proof is analogous to the original and we omit it.
Lemma B.2
Fix a protoerror space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)optimal predictors for f. Assume that σPa and rPa don't depend on a. Consider ^A an Avalued (poly,rlog)bischeme. Assume ^σA factors as ^σP×τ. Then
We consider transparent games between bounded CDT agents ("transparent" meaning each player has a model of the other players). The agents compute the expected utility of each possible action by executing an optimal predictor of a causal counterfactual, i.e. an optimal predictor for a function that evaluates the other players and computes the utility for the selected action. Since the agents simultaneously attempt to predict each other, the optimal predictors form an optimal predictor system for the reflective system comprised by the causal counterfactuals of all agents. We show that for strict maximizers, the resulting outcome is a bounded analogue of an approximate Nash equilibrium, i.e. a strategy which is an optimal response within certain resource constraints up to an asymptotically small error. For "thermalizers" (agents that choose an action with probability proportional to 2uT), we get a similar result with expected utility Es[u] replaced by "free utility" Es[u]+TH(s). Thus, such optimal predictor systems behave like bounded counterparts of reflective oracles.
Preliminaries
The proofs for this section are given in Appendix A.
We redefine E2(ll,ϕ) and E2(ll) to be somewhat smaller protoerror spaces which nevertheless yield the same existence theorems as before. This is thanks to Lemma A.1.
Construction 1
Given ϕ∈Φ, denote E2(ll,ϕ) the set of bounded functions δ:N2→R≥0 s.t.
∀ψ∈Φ:ψ≤ϕ⟹Eλkψ[δ(k,j)]=O(ψ(k)−1)
Denote
E2(ll):=⋂ϕ∈ΦE2(ll,ϕ)
Proposition 1
If ϕ∈Φ is s.t. ∃n:liminfk→∞2−knϕ(k)=0, O(ϕ−1) is a protoerror space.
For any ϕ∈Φ, E2(ll,ϕ) is an ample protoerror space. When ϕ is nondecreasing, E2(ll,ϕ) is stable.
E2(ll) is a stable ample protoerror space.
Notation
We denote O(ϕ−1∞):=O(ϕ−1)1∞. We allow the same abuse of notation for this symbol as for usual big O notation.
For any (poly,rlog)bischeme ^Q=(Q,rQ,σQ) we use ^σkjQ to denote UrQ(k,j)×σkjQ.
For reflective systems R=(Σ,f,μ) we write indices in Σ as superscripts rather than subscripts as before.
Given π∈ΠΣ, we denote U(π)nkj:=Uπnkj2 and πnkj(x,y):=β(ev(πnkj1;x,y)).
Results
The proofs for this section are given in Appendix B.
We start by showing that choosing an action by maximizing over optimally predicted utility is an optimal strategy in some sense.
Theorem 1
Fix a protoerror space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)optimal predictors for f. Assume that σPa and rPa don't depend on a (this doesn't limit generality since we can replace σPa by ∏b∈AσPb and rPa by a polynomial rP≥maxb∈ArPb). Consider ^A an Avalued (poly,rlog)bischeme. Define ^M another Avalued (poly,rlog)bischeme where ^Mkj(x) is computed by sampling y from ^σkjP and choosing arbitrarily from a∈A s.t. ^Pkja(x,y) is maximal. Then, there is δ∈E12 s.t.
Eμk×^σkjM[f^Mkj]≥Eμk×^σkjA[f^Akj]−δ(k,j)
Definition 1
Given suitable sets X,Y and ϕ∈Φ, a ϕ(poly,rlog)scheme of signature X→Y is ^A=(A:N×X×{0,1}∗2→Y,rA:N→N,σA:N×{0,1}∗→[0,1]) s.t.
(i) The runtime of Ak is bounded by p(tϕ(k)) for some polynomial p.
(ii) rA(k)≤p′(tϕ(k)) for some polynomial p′.
(iii) σkA is a probability measure on {0,1}∗ and there is c>0 s.t. suppσkA⊆{0,1}≤c⌊log(k+2)ϕ(k)⌋.
A ϕ(poly,rlog)scheme of signature {0,1}∗→Y is also called a Yvalued ϕ(poly,rlog)scheme.
As usual, we sometimes use ^Akj as implicit notation in which A receives a value sampled from UrA(k) for the second argument and/or a value sampled from σkA for the third argument.
Corollary 1
Fix ϕ∈Φ. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E2(ll,ϕ)(poly,rlog)optimal predictors for f. Consider ϵ∈(0,1) and ^A an Avalued ϕ1−ϵ(poly,rlog)scheme. Define ^M as in Theorem 1. Then
Eμk×λkϕ[E^σkjM[f^Mkj]]≥Eμk×^σkA[f^Ak]−O(ϕ−1∞)
We now introduce the game theoretical setting.
Definition 2
A distributional game is a quadruple G=(μ,P,A,u) where μ is a word ensemble, P is a finite set representing players, {An}n∈P are finite sets representing the possible actions of each player and {un:suppμ×∏n∈PAn→[0,1]}n∈P represent the utility functions.
For each x∈suppμ, G defines an ordinary game in normal form. We think of G as a single game where the strategies are (some class of) functions s:suppμ→An and the payoffs are Eμ[un(∏n∈Psn(x))].
Construction 2
Consider G=(μ,P,A,u) a distributional game and {ϕn∈Φ}n∈P. We construct the reflective system RϕG=(P,uϕ,μG).
μG is defined by
μnkG(k,x,a)=μk(x)χAn(a)#An
Define MnG:N2×suppμ×{0,1}∗×ΠP→℘(An) by
MnkjG(x,y,π):=argmaxa∈Aπnkj((k,x,a),y)
Denote ΔnG to be the space of mixed actions of player n i.e. the space of probability measures on An.
Define snG,ϕ:N×suppμ×ΠP→ΔnG by
PrsnkG,ϕ(x,π)[a]:=Eλkϕn[EU(π)nkj[χMnkjG(x,y,π)#MnkjG(x,y,π)]]
Define sG,ϕ:N×suppμ×ΠP→∏n∈PΔnG by
skG,ϕ(x,π):=∏n∈PsnkG,ϕ(x,π)
Define s¯nG,ϕ:N×suppμ×ΠP→∏m∈P∖nΔmG by
s¯nkG,ϕ(x,^Q):=∏m∈P∖nsmkG,ϕ(x,^Qm)
Finally, define uϕ by
unkϕ(x,a,π):=Ea×s¯nkG,ϕ(x,π)[un]
For any (poly,rlog)predictor ^Q we will use the notation snkG,ϕ(x,^Q)∈ΔnG to mean
snkG,ϕ(x,^Q):=EσQ[snkG,ϕ(x,^Q[z])]
For a family {^Qn}n∈P of (poly,rlog)predictors, we will use the notations skG,ϕ(x,^Q)∈∏n∈PΔnG and s¯nkG,ϕ(x,^Q)∈∏m∈P∖nΔmG to mean
skG(x,^Q):=∏n∈PsnkG,ϕ(x,^Qn)
s¯nkG(x,^Q):=∏m∈P∖nsmkG,ϕ(x,^Qm)
Corollary 2
Fix G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P and ^P an E∗2(ll,ϕ)(poly,rlog)optimal predictor system for RϕG. Consider n∈P, ϵ∈(0,1) and ^A an Anvalued (ϕn)1−ϵ(poly,rlog) scheme. Then
Eμk[EskG,ϕ(x,^P)[un]]≥Eμk[E^Ak(x)×s¯nkG,ϕ(x,^P)[un]]−O((ϕn)−1∞)
We now move from studying strict maximizers to studying thermalizers.
Theorem 2
Fix a protoerror space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)optimal predictors for f and T:N2→R>0 is some function. Assume that σPa and rPa don't depend on a. Consider ^A an Avalued (poly,rlog)bischeme. Suppose ^MT is another Avalued (poly,rlog)bischeme satisfying
Pr^σM[MkjT(x)=a]−E^σP[ZkjT(x,y)−12Pkja(x,y)Tkj]≤δr(k,j)
Here y is sampled from ^σP, ZkjT(x,y):=∑a∈A2Pkja(x,y)Tkj is the normalization factor and δr,Tδ12r∈E14. Then, there is δ∈E14 s.t.
Eμk[E^σkjMT[f^MkjT(x)(x)]+TkjH(^MkjT(x))]≥Eμk[E^σkjA[f^Akj(x)(x)]+TkjH(^Akj(x))]−δ(k,j)
Here H for stands the (base 2) Shannon entropy of a probability measure on A.
Corollary 3
Fix ϕ∈Φ. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E2(ll,ϕ)(poly,rlog)optimal predictors for f and T:N→R>0 a bounded function. Consider ϵ∈(0,1) and ^A an Avalued ϕ1−ϵ(poly,rlog)scheme. Assume ^MT is as in Theorem 2. Then
Eμk[Eλkϕ[E^σkjMT[f^MkjT(x)(x)]]+TkH(Eλkϕ[^MkjT(x)])]≥Eμk[E^σkA[f^Ak(x)(x)]+TkH(^Ak(x))]−O(ϕ−1∞)
Here Eλkϕ[^MkjT(x)] stands for averaging the probability measure ^MkjT(x) on A with respect to parameter j using probability distribution λkϕ.
Construction 3
Consider G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P and T:P×N→R>0. We construct the reflective system Rϕ,TG=(P,uϕ,T,μG).
Define snG,ϕ,T:N×suppμ×ΠP→ΔnG by
PrsnkG,ϕ,T(x,π)[a]:=Eλkϕn[EU(π)nkj[ZnkjG,ϕ,T(x,y,π)−12(Tnk)−1πnkj((k,x,a),y)]]
Here Z is the normalization factor.
Define sG,ϕ,T:N×suppμ×ΠP→∏n∈PΔnG by
skG,ϕ,T(x,π):=∏n∈PsnkG,ϕ,T(x,π)
Define s¯nG,ϕ,T:N×suppμ×ΠP→∏m∈P∖nΔmG by
s¯nkG,ϕ,T(x,π):=∏m∈P∖nsmkG,ϕ,T(x,π)
Finally, define uϕ,T by
unkϕ,T(x,a,π):=Ea×s¯nkG,ϕ,T(x,π)[un]
We define the notations snkG,ϕ,T(x,^Q), skG,ϕ,T(x,^Q) and s¯nkG,ϕ,T(x,^Q) analogously to before.
Corollary 4
Fix G=(μ,P,A,u) a distributional game, {ϕn∈Φ}n∈P, T:P×N→R>0 bounded and ^P an E∗2(ll,ϕ)(poly,rlog)optimal predictor system for Rϕ,TG. Consider n∈P, ϵ∈(0,1) and ^A an Anvalued (ϕn)1−ϵ(poly,rlog) scheme. Then
Eμk[EskG,ϕ,T(x,^P)[un]+TkH(snkG,ϕ,T(x,^P))]≥Eμk[E^Ak(x)×s¯nkG,ϕ,T(x,^P)[un]+TkH(^Ak(x))]−O((ϕn)−1∞)
Finally, we show that assuming ϕn doesn't depend on n and T doesn't fall to zero too fast, deterministic advice is sufficient to construct an optimal predictor system for Rϕ,TG.
Definition 3
Consider a reflective system R=(Σ,f,μ) and {ϕn∈Φ}n∈Σ. R is said to be ϕHoelder when there are {cnk∈R>0}n∈Σ,k∈N, {ρn:Σ→[0,1]}n∈Σ, {ψnm∈Φ}n,m∈Σ, {αn∈(0,1]}n∈Σ and {δn:N→R≥0}n∈Σ s.t.
(i) ∀n∈Σ∃ϵn>0:cnk=O(ϕn(k)αn2−ϵn)
(ii) ∑m∈Σρn(m)=1
(iii) ψnm≥ϕn
(iv) δn(k)=O(ϕn(k)−1∞)
(v) Eμnk[(fn(x,π)−fn(x,~π))2]≤cnkEρn[Eμmk×λkψnm[EU(π)mkj×U(~π)mkj[(πmkj(x,y)−~πmkj(x,y))2]]]αn+δn(k)
Theorem 3
Consider a finite set Σ, a reflective system R=(Σ,f,μ) and {ϕn∈Φ}n∈Σ. If R is ϕHoelder then it has an E∗2(ll,ϕ)(poly,log)optimal predictor system.
Theorem 4
Consider G=(μ,P,A,u) a distributional game, ϕ∈Φ and T:P×N→R>0. Assume 1Tnk=O(ϕ(k)14−ϵ) for some ϵ>0. Then Rϕ,TG is ϕHoelder.
Appendix A
Proposition A.1
Suppose h:R≥0→R≥0 is a nondecreasing function s.t. ∫∞0h(x)−1dx<∞. Define δh:N2→R≥0 by
δh(k,j):=min(loglog(k+3)h(loglog(j+3)),1)
Then, δh∈E2(ll).
Proof of Proposition A.1
Consider ϕ∈Φ. We have
limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]=limsupk→∞ϕ(k)∑tϕ(k)−1j=2(loglog(j+1)−loglogj)δh(k,j)loglogtϕ(k)
limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]=limsupk→∞ϕ(k)∫tϕ(k)x=2δh(k,⌊x⌋)d(loglogx)ϕ(k)loglogk
limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]≤limsupk→∞∫tϕ(k)x=2h(loglog(⌊x⌋+3))−1d(loglogx)
limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]≤limsupk→∞∫tϕ(k)x=2h(loglogx)−1d(loglogx)
limsupk→∞ϕ(k)Eλkϕ[δh(k,j)]≤∫∞0h(t)−1dt
Proof of Proposition 1
Mostly obvious modulo Proposition A.1. To see E2(ll,ϕ) is stable for nondecreasing ϕ, consider a nonconstant polynomial p:N→N and δ∈E2(ll,ϕ). Define δ′(k,j):=δ(p(k),j). To get the desired condition for δ′ and ψ∈Φ with ψ≤ϕ, consider any ψ′∈Φ s.t. ψ′≤ϕ and for sufficiently large k we have ψ′(p(k))=ψ(k). Suche ψ′ exists since for for k sufficiently large ϕ(k)≤ϕ(p(k)). We have
limsupk→∞ψ′(k)Eλkψ′[δ(k,j)]<∞
In particular
limsupk→∞ψ′(p(k))Eλp(k)ψ′[δ(p(k),j)]<∞
limsupk→∞ψ(k)Eλkψ[δ′(k,j)]<∞
Proposition A.2
Consider a polynomial q:N2→N. There is a function λq:N3→[0,1] s.t.
(i) ∀k,j∈N:∑i∈Nλq(k,j,i)=1
(ii) For any function ϵ:N2→[0,1] we have
ϵ(k,j)−∑i∈Nλq(k,j,i)ϵ(k,q(k,j)+i)∈E2(ll)
Proof of Proposition A.2
Given functions q1,q2:N2→N s.t. q1(k,j)≥q2(k,j) for k,j≫0, the proposition for q1 implies the proposition for q2 by setting
λq2(k,j,i):={λq1(k,j,i−q1(k,j)+q2(k,j))if i−q1(k,j)+q2(k,j)≥00if i−q1(k,j)+q2(k,j)<0
Therefore, it is enough to prove to proposition for functions of the form q(k,j)=jm+nlogk for m>0.
Consider any ϕ∈Φ. We have
limsupk→∞ϕ(k)loglogkϕ(k)loglogk<∞
limsupk→∞ϕ(k)log(m+nlogk)ϕ(k)loglogk<∞
limsupk→∞ϕ(k)2m+nlogk∫x=2d(loglogx)ϕ(k)loglogk<∞
Since ϵ takes values in [0,1]
limsupk→∞ϕ(k)2m+nlogk∫x=2ϵ(k,⌊x⌋)d(loglogx)ϕ(k)loglogk<∞
Similarly
limsupk→∞ϕ(k)tϕ(k)m+nlogk∫x=tϕ(k)ϵ(k,⌊x⌋)d(loglogx)ϕ(k)loglogk<∞
The last two equations imply that
limsupk→∞ϕ(k)tϕ(k)∫x=2ϵ(k,⌊x⌋)d(loglogx)−tϕ(k)m+nlogk∫x=2m+nlogkϵ(k,⌊x⌋)d(loglogx)ϕ(k)loglogk<∞
Raising x to a power is equivalent to adding a constant to loglogx, therefore
limsupk→∞ϕ(k)tϕ(k)∫x=2ϵ(k,⌊x⌋)d(loglogx)−tϕ(k)∫x=2ϵ(k,⌊xm+nlogk⌋)d(loglogx)ϕ(k)loglogk<∞
limsupk→∞ϕ(k)tϕ(k)∫x=2(ϵ(k,⌊x⌋)−ϵ(k,⌊xm+nlogk⌋))d(loglogx)ϕ(k)loglogk<∞
Since ⌊xm+nlogk⌋≥⌊x⌋m+nlogk we can choose λq satisfying condition (i) so that
j+1∫x=jϵ(k,⌊xm+nlogk⌋)d(loglogx)=(loglog(j+1)−loglogj)∑iλq(k,j,i)ϵ(k,jm+nlogk+i)
It follows that
j+1∫x=jϵ(k,⌊xm+nlogk⌋)d(loglogx)=j+1∫x=j∑iλq(k,⌊x⌋,i)ϵ(k,⌊x⌋m+nlogk+i)d(loglogx)
limsupk→∞ϕ(k)tϕ(k)∫x=2(ϵ(k,⌊x⌋)−∑iλq(k,⌊x⌋,i)ϵ(k,⌊x⌋m+nlogk+i))d(loglogx)ϕ(k)loglogk<∞
limsupk→∞ϕ(k)∑tϕ(k)−1j=2(loglog(j+1)−loglogj)(ϵ(k,j)−∑iλq(k,j,i)ϵ(k,jm+nlogk+i))ϕ(k)loglogk<∞
ϵ(k,j)−∑i∈Nλq(k,j,i)ϵ(k,q(k,j)+i)∈E2(ll)
Lemma A.1
Consider (f,μ) a distributional estimation problem, ^P, ^Q (poly,rlog)predictors. Suppose p:N2→N a polynomial, ϕ∈Φ and δ∈E2(ll,ϕ) are s.t.
∀i,k,j∈N:E[(^Pk,p(k,j)+i−f)2]≤E[(^Qkj−f)2]+δ(k,j)
Then ∃δ′∈E2(ll,ϕ) s.t.
E[(^Pkj−f)2]≤E[(^Qkj−f)2]+δ′(k,j)
The proof of Lemma A.1 is analogous to before and we omit it.
Appendix B
The following is a slightly stronger version of one direction of the orthogonality lemma.
Lemma B.1
Consider (f,μ) a distributional estimation problem and ^P an E(poly,rlog)optimal predictor for (f,μ). Then there are c1,c2∈R and an E12moderate function δ:N5→[0,1] s.t. for any k,j,r,l,t∈N, τ a probability measure on {0,1}≤l and Q:suppμk×{0,1}rP(k,j)×{0,1}∗×{0,1}r×{0,1}≤lalg−→Q that runs in time t on any valid input
Eμk×UrP(k,j)×σkjP×Ur×τ[Q(x,y,z,u,w)(Pkj(x,y,z)−f(x))]≤(c1+c2Eμk×UrP(k,j)×σkjP×Ur×τ[Q(x,y,z,u,w)2])δ(k,j,t,2l,2Q)
The proof is analogous to the original and we omit it.
Lemma B.2
Fix a protoerror space E. Consider A a finite set, μ a word ensemble and {fa:suppμ→[0,1]}a∈A. Suppose {^Pa}a∈A are E(poly,rlog)optimal predictors for f. Assume that σPa and rPa don't depend on a. Consider ^A an Avalued (poly,rlog)bischeme. Assume ^σA factors as ^σP×τ. Then
Eμ×^σP×τ[PA(x,y,z)(x,y)−fA(x,y,z)]∈E12
Proof of Lemma B.2
Eμ×^σP×τ[PA(x,y,z)(x,y)−fA(x,y,z)(x)]=∑a∈AEμ×^σP×τ[δA(x,y,z),a(Pa(x,y)−fa(x))]
Applying Lemma B.1 we get the desired result.
Proof of Theorem 1
Lemma B.2 implies
Eμ×^σP×^σA[^P^A−f^A]∈E12
