% operators that are separated from the operand by a space
% operators that require brackets
% operators that require parentheses
% Paper specific
This post continues the research programme of attacking the grain of truth problem by departure from the Bayesian paradigm. In the previous post, I suggested using Savage's minimax regret decision rule, but here I fall back to the simple minimax decision rule. This is because the mathematics is considerably simpler, and minimax should be sufficient to get IUD play in general games and Nash equilibrium in zero-sum two-player games. I hope to build on these results to get analogous results for minimax regret in the future.
We consider "semi-Bayesian" agents following the minimax expected utility decision rule, in oblivious environments with full monitoring (a setting that we will refer to as "forecasting"). This setting is considered in order to avoid the need to enforce exploration, as a preparation for analysis of general environments. We show that such agents satisfy a certain asymptotic optimality theorem. Intuitively, this theorem means that whenever the environment satisfies an incomplete model that is included in the prior, the agent will eventually learn this model i.e. extract at least as much utility as can be guaranteed for this model.
Appendix A contains the proofs of all results. Appendix B contains some theorems from the literature together with a corollary used in Appendix A.
##Notation
Given X a topological space, P(X) will denote the space of Borel probability measures on X. We regard it as a topological space using the weak∗ topology. Given x∈X, δx∈P(X) is defined by δx(A):=[[x∈A]]. Given μ,ν∈P(X), dtv(μ,ν) stands for the total variation distance between μ and ν. We denote PC(X) the set of non-empty convex compact subsets of P(X).
Given X a finite set, X∗ denotes the set of finite strings in alphabet X, i.e. X∗:=⨆n∈NXn. Xω denotes the set of infinite strings in alphabet X. Given x∈X∗⊔Xω, |x|∈N⊔{∞} is the length of string. Given 0≤n<|x|, xn∈X is the n-th symbol in x. Given 0≤n≤|x|, x<n is the prefix of x of length n. Given x,y∈X∗⊔Xω, the notations x⊏y, x⊑y, x⊏/y and x⋢y mean "x is a proper prefix of y", "x is a prefix of y" and their negations. λX∈X∗ is the empty string and X+:=X∗∖λX.
##Elementary properties of minimax
Fix S and E compact Polish spaces, S representing the agent's pure policies and E representing the pure environments. Let u:S×E→R be a continuous utility function.
#Proposition 1
Consider Φ⊆P(E). There exists
π∗∈argmaxπ∈P(S)infμ∈ΦEπ×μ[u]
Moreover, let ¯Φ be the closure of the convex hull of Φ. Then, the same π∗ satisfies
π∗∈argmaxπ∈P(S)minμ∈¯ΦEπ×μ[u]
We will say that such a π∗ is a minimax policy for Φ.
#Proposition 2
Consider Φ,Φ′∈PC(E) and p∈[0,1]. Then, there is ν∗∈Φ′ s.t. any minimax policy for pΦ+(1−p)Φ′ is a minimax policy for pΦ+(1−p)ν∗.
In particular, Proposition 2 implies that a minimax policy for Φ is the optimal policy for some μ∗∈Φ.
Now we ask what policy is implemented by "subagents" created by a minimax policy. Suppose S=S1×S2, where S2 represents the pure policies of the subagent. Assume that there is a Borel set A⊆E (the condition for the creation of the subagent), a finite set T (the actions taken before the creation of the subagent), a Borel measurable mapping α:S1→T and continuous functions u1:S1×E→R and u2:S2×T×E→R s.t.
u(s1,s2,e)=u1(s1,e)+χA(e)u2(s2,α(s1),e)
Consider Φ∈PC(E) and π∗∈P(S) minimax for Φ. Denote pr1,2:S→S1,2 the projection mappings. Define π∗1∈P(S1) by
π∗1:=pr1∗π∗
Define π∗2:T→P(S2) by
π∗2(t):=pr2∗(π∗∣α−1(t)×S2)
If for some t0∈T we have π∗(α−1(t0)×S2)=0, then π∗2(t0) can be arbitrary.
It should also be possible to make do without T and the associated decomposition of u by instead decomposing π∗ into π∗1 and a Markov kernel from S1 to S2.
##Asymptotic optimality
Fix finite sets A (actions) and O (observations). We now assume that E=Oω and S=O∗→A with the product topology. We fix γ:N→R≥0 a time discount function satisfying ∑nγ(n)<∞ and r:(A×O)∗→R a bounded reward function. Given e∈O∗ and s:O∗→A, we define es∈(A×O)ω by
esn:=(s(e<n),en)
The utility function is then given by
u(s,e)=∑n∈Nγ(n)r(es<n)
We will need a notation for a combination of two policies where the agent switches from one to another when observing some x∈O∗. Given s1:O∗→A and s2:O∗→A, we define s1×xs2:O∗→A by
(s1×xs2)(y):={s1(y) if y⋣xs2(z) if y=xz
Consider any π∈P(O∗→A) and ρ:A|x|→P(O∗→A). We define π⊗xρ∈P(O∗→A) as follows. For any Borel measurable B⊆O∗→A:
It is easy to see the above indeed defines a probability measure.
Note that changing the policy after a certain event cannot change utility more than O(1) times the probability of the event times the corresponding time discount factor. That is, we have the following:
We are now ready to formulate the optimality theorem. Consider Φ,Φ′∈PC(Oω) and p∈(0,1]. Denote Ψ=pΦ+(1−p)Φ′. We think of Ψ as the prior, Φ as one of the models in the prior (assigned probability p) and Φ′ as the convex combination of all other models. Let π∗∈P(O∗→A) be a minimax policy for Ψ. Define v:O∗→R by
v(x):=maxρ∈A|x|→P(O∗→A)minμ∈ΦE(π∗⊗xρ)×μ[u]
That is, v is the expected utility that can be guaranteed by changing the policy following x, assuming that the true environment is in Φ. We claim that if the true environment is in Φ, then after sufficient time π∗ will achieve at least as much utility as can be guaranteed for Φ (guaranteed under the constraint of following π∗ until the point of making the current observations). That is, v can only be greater than Eπ∗×μ[u] by a quantity that goes to 0 with probability 1, even after normalizing according to Proposition~5:
We will repeatedly use the standard fact that, given X a compact Polish space, P(X) is also a compact Polish space (it is metrizable by the Levy-Prokhorov metric, compact as a consequence of the Banach-Alaoglu theorem and the fact that any probability measure on a Polish space is Radon, and separability is also not hard to prove).
#Proposition A.1
Consider X,Y compact Polish spaces, f:X×Y→R continuous, x∞∈X and a sequence xn→x∞. Define gn:Y→R by gn(y):=f(xn,y) and g∞:Y→R by g∞(y):=f(x∞,y). Then, gn converges to g∞ uniformly.
#Proof of Proposition A.1
Assume to the contrary that there is ϵ>0 and a subsequence {x′n∈X}n∈N of {xn} s.t.
maxy∈Y|f(x′n,y)−f(x∞,y)|>ϵ
This implies we can choose {yn∈Y}n∈N s.t. |f(x′n,yn)−f(x∞,yn)|>ϵ. Let {nk∈N}k∈N be an increasing sequence s.t. ynk→y∞ for some y∞∈Y. We get |f(x′nk,ynk)−f(x∞,ynk)|>ϵ, which is a contradiction because f(x′nk,ynk)→f(x∞,y∞) and f(x∞,ynk)→f(x∞,y∞).
#Proposition A.2
Consider X,Y compact Polish spaces and f:X×Y→R continuous. Define ϕ:X×P(Y)→R by
ϕ(x,ν):=Ey∼ν[f(x,y)]
Then, ϕ is continuous.
#Proof of Proposition A.2
Consider any x∞∈X, ν∞∈P(Y) and sequences xn→x∞, νn→ν∞. We have
By Proposition A.1, the right hand converges to 0, therefore the left hand side also converges to 0.
In the following, we will use the notation U:P(S)×P(E)→R, defined by U(π,μ):=Eπ×μ[u]. By Proposition A.3, U is continuous.
#Proof of Proposition 1
Define UΦ:P(S)→R by UΦ(π):=infμ∈ΦU(π,μ). By Proposition A.4, UΦ is continuous and therefore, π∗ exists.
U is continuous and affine in the 2nd argument (i.e. it sends convex linear combinations to convex linear combinations), and therefore infμ∈ΦU(π,μ)=minμ∈¯ΦU(π,μ), implying the second part of the proposition.
#Proof of Proposition 2
Define V:P(E)→R by V(μ):=maxπ∈P(S)U(π,μ)=maxs∈SU(δs,μ). Denote Ψ:=pΦ+(1−p)Φ′. By Proposition A.4, V is continuous and therefore, there exists
ξ∗∈argminμ∈ΨV(μ)
Choose μ∗∈Φ, ν∗∈Φ′ s.t. ξ∗=pμ∗+(1−p)ν∗.
P(S) and Ψ are compact convex sets in the spaces of finite signed Borel measures on S and E respectively. Therefore, we can apply the minimax theorem to get
The desired result now follows from (infr)∑n>|x|γ(n)≤u2≤(supr)∑n>|x|γ(n).
#Definition A.2
A splitting of (S,E,u) is (S1,S2,A,T,α,u1,u2) s.t. S≅S1×S2, A⊆E is Borel, T is a finite set, α:S1→T is Borel measurable, u1:S1×E→R and u2:S2×T×E→R are continuous and
u(s1,s2,e)=u1(s1,e)+χA(e)u2(s2,α(s1),e)
#Proposition A.8
Consider Φ∈PC(E), ν∗∈P(E) and p∈[0,1], and denote Ψ:=pΦ+(1−p)ν∗. Suppose that (S1,S2,A,T,α,u1,u2) is a splitting of (S,E,u). Assume ν∗(A)>0. Fix π∈P(S1). Let ρ∗:T→P(S2) satisfy
ρ∗∈argmaxρ:T→P(S2)minξ∈ΨU(π⊗αρ,ξ)
Denote Γ:=supu2−infu2. Then, for any μ0∈Φ s.t. μ0(A)>0
%&latex
% operators that are separated from the operand by a space
% operators that require brackets
% operators that require parentheses
% Paper specific
This post continues the research programme of attacking the grain of truth problem by departure from the Bayesian paradigm. In the previous post, I suggested using Savage's minimax regret decision rule, but here I fall back to the simple minimax decision rule. This is because the mathematics is considerably simpler, and minimax should be sufficient to get IUD play in general games and Nash equilibrium in zero-sum two-player games. I hope to build on these results to get analogous results for minimax regret in the future.
We consider "semi-Bayesian" agents following the minimax expected utility decision rule, in oblivious environments with full monitoring (a setting that we will refer to as "forecasting"). This setting is considered in order to avoid the need to enforce exploration, as a preparation for analysis of general environments. We show that such agents satisfy a certain asymptotic optimality theorem. Intuitively, this theorem means that whenever the environment satisfies an incomplete model that is included in the prior, the agent will eventually learn this model i.e. extract at least as much utility as can be guaranteed for this model.
Appendix A contains the proofs of all results. Appendix B contains some theorems from the literature together with a corollary used in Appendix A.
##Notation
Given X a topological space, P(X) will denote the space of Borel probability measures on X. We regard it as a topological space using the weak∗ topology. Given x∈X, δx∈P(X) is defined by δx(A):=[[x∈A]]. Given μ,ν∈P(X), dtv(μ,ν) stands for the total variation distance between μ and ν. We denote PC(X) the set of non-empty convex compact subsets of P(X).
Given X a finite set, X∗ denotes the set of finite strings in alphabet X, i.e. X∗:=⨆n∈NXn. Xω denotes the set of infinite strings in alphabet X. Given x∈X∗⊔Xω, |x|∈N⊔{∞} is the length of string. Given 0≤n<|x|, xn∈X is the n-th symbol in x. Given 0≤n≤|x|, x<n is the prefix of x of length n. Given x,y∈X∗⊔Xω, the notations x⊏y, x⊑y, x⊏/y and x⋢y mean "x is a proper prefix of y", "x is a prefix of y" and their negations. λX∈X∗ is the empty string and X+:=X∗∖λX.
##Elementary properties of minimax
Fix S and E compact Polish spaces, S representing the agent's pure policies and E representing the pure environments. Let u:S×E→R be a continuous utility function.
#Proposition 1
Consider Φ⊆P(E). There exists
π∗∈argmaxπ∈P(S)infμ∈ΦEπ×μ[u]
Moreover, let ¯Φ be the closure of the convex hull of Φ. Then, the same π∗ satisfies
π∗∈argmaxπ∈P(S)minμ∈¯ΦEπ×μ[u]
We will say that such a π∗ is a minimax policy for Φ.
#Proposition 2
Consider Φ,Φ′∈PC(E) and p∈[0,1]. Then, there is ν∗∈Φ′ s.t. any minimax policy for pΦ+(1−p)Φ′ is a minimax policy for pΦ+(1−p)ν∗.
In particular, Proposition 2 implies that a minimax policy for Φ is the optimal policy for some μ∗∈Φ.
Now we ask what policy is implemented by "subagents" created by a minimax policy. Suppose S=S1×S2, where S2 represents the pure policies of the subagent. Assume that there is a Borel set A⊆E (the condition for the creation of the subagent), a finite set T (the actions taken before the creation of the subagent), a Borel measurable mapping α:S1→T and continuous functions u1:S1×E→R and u2:S2×T×E→R s.t.
u(s1,s2,e)=u1(s1,e)+χA(e)u2(s2,α(s1),e)
Consider Φ∈PC(E) and π∗∈P(S) minimax for Φ. Denote pr1,2:S→S1,2 the projection mappings. Define π∗1∈P(S1) by
π∗1:=pr1∗π∗
Define π∗2:T→P(S2) by
π∗2(t):=pr2∗(π∗∣α−1(t)×S2)
If for some t0∈T we have π∗(α−1(t0)×S2)=0, then π∗2(t0) can be arbitrary.
#Proposition 3
π∗2∈argmaxπ2:T→P(S2)minμ∈Φ(Eπ∗1×μ[u1]+μ(A)Et∼α∗π∗1[Eπ2(t)×μ∣A[u2(t)]])
It should also be possible to make do without T and the associated decomposition of u by instead decomposing π∗ into π∗1 and a Markov kernel from S1 to S2.
##Asymptotic optimality
Fix finite sets A (actions) and O (observations). We now assume that E=Oω and S=O∗→A with the product topology. We fix γ:N→R≥0 a time discount function satisfying ∑nγ(n)<∞ and r:(A×O)∗→R a bounded reward function. Given e∈O∗ and s:O∗→A, we define es∈(A×O)ω by
esn:=(s(e<n),en)
The utility function is then given by
u(s,e)=∑n∈Nγ(n)r(es<n)
We will need a notation for a combination of two policies where the agent switches from one to another when observing some x∈O∗. Given s1:O∗→A and s2:O∗→A, we define s1×xs2:O∗→A by
(s1×xs2)(y):={s1(y) if y⋣xs2(z) if y=xz
Consider any π∈P(O∗→A) and ρ:A|x|→P(O∗→A). We define π⊗xρ∈P(O∗→A) as follows. For any Borel measurable B⊆O∗→A:
(π⊗xρ)(B):=∑t∈A|x|Prs1∼πs2∼ρ(t)[s1×xs2∈B∧∀n<|x|:tn=s1(x<n)]
It is easy to see the above indeed defines a probability measure.
Note that changing the policy after a certain event cannot change utility more than O(1) times the probability of the event times the corresponding time discount factor. That is, we have the following:
#Proposition 4
|E(π⊗xρ)×μ[u]−Eπ×μ[u]|≤(supr−infr)Pre∼μ[x⊏e]∑n>|x|γ(n)
We are now ready to formulate the optimality theorem. Consider Φ,Φ′∈PC(Oω) and p∈(0,1]. Denote Ψ=pΦ+(1−p)Φ′. We think of Ψ as the prior, Φ as one of the models in the prior (assigned probability p) and Φ′ as the convex combination of all other models. Let π∗∈P(O∗→A) be a minimax policy for Ψ. Define v:O∗→R by
v(x):=maxρ∈A|x|→P(O∗→A)minμ∈ΦE(π∗⊗xρ)×μ[u]
That is, v is the expected utility that can be guaranteed by changing the policy following x, assuming that the true environment is in Φ. We claim that if the true environment is in Φ, then after sufficient time π∗ will achieve at least as much utility as can be guaranteed for Φ (guaranteed under the constraint of following π∗ until the point of making the current observations). That is, v can only be greater than Eπ∗×μ[u] by a quantity that goes to 0 with probability 1, even after normalizing according to Proposition~5:
#Theorem 1
∀μ∈Φ:Pre∼μ[limn→∞max(v(e<n)−Eπ∗×μ[u]Pre′∼μ[e<n⊏e′]∑m>nγ(m),0)=0]=1
##Appendix A
We will repeatedly use the standard fact that, given X a compact Polish space, P(X) is also a compact Polish space (it is metrizable by the Levy-Prokhorov metric, compact as a consequence of the Banach-Alaoglu theorem and the fact that any probability measure on a Polish space is Radon, and separability is also not hard to prove).
#Proposition A.1
Consider X,Y compact Polish spaces, f:X×Y→R continuous, x∞∈X and a sequence xn→x∞. Define gn:Y→R by gn(y):=f(xn,y) and g∞:Y→R by g∞(y):=f(x∞,y). Then, gn converges to g∞ uniformly.
#Proof of Proposition A.1
Assume to the contrary that there is ϵ>0 and a subsequence {x′n∈X}n∈N of {xn} s.t.
maxy∈Y|f(x′n,y)−f(x∞,y)|>ϵ
This implies we can choose {yn∈Y}n∈N s.t. |f(x′n,yn)−f(x∞,yn)|>ϵ. Let {nk∈N}k∈N be an increasing sequence s.t. ynk→y∞ for some y∞∈Y. We get |f(x′nk,ynk)−f(x∞,ynk)|>ϵ, which is a contradiction because f(x′nk,ynk)→f(x∞,y∞) and f(x∞,ynk)→f(x∞,y∞).
#Proposition A.2
Consider X,Y compact Polish spaces and f:X×Y→R continuous. Define ϕ:X×P(Y)→R by
ϕ(x,ν):=Ey∼ν[f(x,y)]
Then, ϕ is continuous.
#Proof of Proposition A.2
Consider any x∞∈X, ν∞∈P(Y) and sequences xn→x∞, νn→ν∞. We have
ϕ(xn,νn)−ϕ(x∞,ν∞)=Ey∼νn[f(xn,y)−f(x∞,y)]+Ey∼νn[f(x∞,y)]−Ey∼ν∞[f(x∞,y)]
On the right hand side, the first term goes to 0 by Proposition A.1 and the second term goes to 0 by definition of the weak* topology.
#Proposition A.3
Consider any X,Y compact Polish and f:X×Y→R continuous. Define F:P(X)×P(Y)→R by
F(μ,ν):=Eμ×ν[f]
Then, F is continuous.
#Proof of Proposition A.3
Follows by applying Fubini's theorem and using Proposition A.2 twice.
#Proposition A.4
Consider any X,Y compact Polish, A⊆Y and f:X×Y→R continuous. Define fA:X→R by fA(x):=infy∈Af(x,y). Then, fA is continuous.
#Proof of Proposition A.4
Consider any x∞∈X and a sequence xn→x∞
|infy∈Af(xn,y)−infy∈Af(x∞,y)|≤supy∈A|f(xn,y)−f(x∞,y)|
By Proposition A.1, the right hand converges to 0, therefore the left hand side also converges to 0.
In the following, we will use the notation U:P(S)×P(E)→R, defined by U(π,μ):=Eπ×μ[u]. By Proposition A.3, U is continuous.
#Proof of Proposition 1
Define UΦ:P(S)→R by UΦ(π):=infμ∈ΦU(π,μ). By Proposition A.4, UΦ is continuous and therefore, π∗ exists.
U is continuous and affine in the 2nd argument (i.e. it sends convex linear combinations to convex linear combinations), and therefore infμ∈ΦU(π,μ)=minμ∈¯ΦU(π,μ), implying the second part of the proposition.
#Proof of Proposition 2
Define V:P(E)→R by V(μ):=maxπ∈P(S)U(π,μ)=maxs∈SU(δs,μ). Denote Ψ:=pΦ+(1−p)Φ′. By Proposition A.4, V is continuous and therefore, there exists
ξ∗∈argminμ∈ΨV(μ)
Choose μ∗∈Φ, ν∗∈Φ′ s.t. ξ∗=pμ∗+(1−p)ν∗.
P(S) and Ψ are compact convex sets in the spaces of finite signed Borel measures on S and E respectively. Therefore, we can apply the minimax theorem to get
maxπ∈P(S)minξ∈ΨU(π,ξ)=minξ∈Ψmaxs∈SU(δs,ξ)
Denote the above number u∗. We have
minμ∈ΦU(π∗,pμ+(1−p)ν∗)≥minξ∈ΨU(π∗,ξ)=u∗
On the other hand
maxπ∈P(S)minμ∈ΦU(π,pμ+(1−p)ν∗)≤maxπ∈P(S)U(π,ξ∗)=u∗
Combining the inequalities, we get the desired result.
#Definition A.1
In the setting of Proposition 3, consider any π1∈P(S1) and π2:T→P(S2). We define π1⊗απ2∈P(S1×S2) as follows. For any Borel measurable B⊆S1×S2:
(π1⊗απ2)(B):=∑t∈T(π1×π2(t))(B∩(α−1(t)×S2))
It is easy to see the above indeed defines a probability measure.
#Proposition A.5
Consider any π1∈P(S1) and π2:T→P(S2) and f:S1×S2→R Borel measurable and bounded. Then:
Eπ1⊗απ2[f]=Es1∼π1[Es2∼π2(α(s1))[f(s1,s2)]]
#Proof of Proposition A.5
Eπ1⊗απ2[f]=∑t∈T∫α−1(t)×S2f(s1,s2)d(π1×π2(t))(s2)
Eπ1⊗απ2[f]=∑t∈T∫α−1(t)∫S2f(s1,s2)dπ2(t)(s2)dπ1(s1)
Eπ1⊗απ2[f]=∑t∈T∫α−1(t)Es2∼π2(t)[f(s1,s2)]dπ1(s1)
Eπ1⊗απ2[f]=∑t∈T∫α−1(t)Es2∼π2(α(s1))[f(s1,s2)]dπ1(s1)
Eπ1⊗απ2[f]=∫S1Es2∼π2(α(s1))[f(s1,s2)]dπ1(s1)
Eπ1⊗απ2[f]=Es1∼π1[Es2∼π2(α(s1))[f(s1,s2)]]
#Proposition A.6
Given π1∈P(S1), π2:T→P(S2) and μ∈P(E)
U(π1⊗απ2,μ)=Eπ1×μ[u1]+μ(A)Et∼α∗π1[Eπ2(t)×μ∣A[u2(t)]]
#Proof of Proposition A.6
Applying Proposition A.5:
U(π1⊗απ2,μ)=Es1∼π1[Es2∼π2(α(s1))[Ee∼μ[u1(s1,e)+χA(e)u2(s2,α(s1),e)]]]
U(π1⊗απ2,μ)=Eπ1×μ[u1]+Es1∼π1[Es2∼π2(α(s1))[μ(A)Ee∼μ∣A[u2(s2,α(s1),e)]]]
U(π1⊗απ2,μ)=Eπ1×μ[u1]+μ(A)Et∼α∗π1[Eπ2(t)×μ∣A[u2(t)]]
#Proposition A.7
Consider any π∈P(S1×S2). Define π1∈P(S1) and π2:T→P(S2) by
π1:=pr1∗π
π2(t):=pr2∗(π∣α−1(t)×S2)
If for some t0∈T we have π(α−1(t0)×S2)=0, then π2(t0) can be arbitrary.
Then,
∀μ∈P(E):U(π,μ)=U(π1⊗απ2,μ)
#Proof of Proposition A.7
U(π,μ)=E(s1,s2)∼πe∼μ[u1(s1,e)+χA(e)u2(s2,α(s1),e)]
U(π,μ)=Eπ1×μ[u1]+μ(A)E(s1,s2)∼πe∼μ∣A[u2(s2,α(s1),e)]
U(π,μ)=Eπ1×μ[u1]+μ(A)∑t∈TPr(s1,s2)∼π[α(s1)=t]E(s1,s2)∼π∣α−1(t)×S2e∼μ∣A[u2(s2,t,e)]
U(π,μ)=Eπ1×μ[u1]+μ(A)Et∼α∗π1[Eπ2(t)×μ∣A[u2(t)]]
Applying Proposition A.6, we get the desired result.
#Proof of Proposition 3
Consider any π2:T→P(S2). By definition of π∗
minμ∈ΦU(π∗1⊗απ2,μ)≤minμ∈ΦU(π∗,μ)
Applying Proposition A.7 to the right hand side
minμ∈ΦU(π∗1⊗απ2,μ)≤minμ∈ΦU(π∗1⊗απ∗2,μ)
Applying Proposition A.6 to both sides, we get the desired result.
Given any x∈O∗, we denote xOω:={xy∣y∈Oω} and ¯xOω:=Oω∖xOω.
#Proof of Proposition 4
We have
Eπ×μ[u]=μ(xOω)Eπ×μ∣xOω[u]+(1−μ(xOω))Eπ×μ∣¯xO∗[u]
E(π⊗xρ)×μ[u]=μ(xOω)E(π⊗xρ)×μ∣xOω[u]+(1−μ(xOω))Eπ×μ∣¯xOω[u]
E(π⊗xρ)×μ[u]−Eπ×μ[u]=μ(xOω)(E(π⊗xρ)×μ∣xOω[u]−Eπ×μ∣xOω[u])
Denote u1(s,e):=∑n≤|x|γ(n)r(es<n), u2(s,e):=∑n>|x|γ(n)r(es<n). We have
Eπ×μ∣xOω[u]=Eπ×μ∣xOω[u1]+Eπ×μ∣xO∗[u2]
E(π⊗xρ)×μ∣xOω[u]=Eπ×μ∣xOω[u1]+E(π⊗xρ)×μ∣xOω[u2]
E(π⊗xρ)×μ[u]−Eπ×μ[u]=μ(xOω)(E(π⊗xρ)×μ∣xOω[u2]−Eπ×μ∣xOω[u2])
The desired result now follows from (infr)∑n>|x|γ(n)≤u2≤(supr)∑n>|x|γ(n).
#Definition A.2
A splitting of (S,E,u) is (S1,S2,A,T,α,u1,u2) s.t. S≅S1×S2, A⊆E is Borel, T is a finite set, α:S1→T is Borel measurable, u1:S1×E→R and u2:S2×T×E→R are continuous and
u(s1,s2,e)=u1(s1,e)+χA(e)u2(s2,α(s1),e)
#Proposition A.8
Consider Φ∈PC(E), ν∗∈P(E) and p∈[0,1], and denote Ψ:=pΦ+(1−p)ν∗. Suppose that (S1,S2,A,T,α,u1,u2) is a splitting of (S,E,u). Assume ν∗(A)>0. Fix π∈P(S1). Let ρ∗:T→P(S2) satisfy
ρ∗∈argmaxρ:T→P(S2)minξ∈ΨU(π⊗αρ,ξ)
Denote Γ:=supu2−infu2. Then, for any μ0∈Φ s.t. μ0(A)>0
U(π⊗αρ∗,μ0)≥maxρ:T→P(S2)minμ∈ΦU(π⊗αρ,μ)−2Γμ0(A)dtv(μ0∣A,ν∗∣A)
#Proof of Proposition A.8
Let τ∗:T→P(S2) satisfy
τ∗∈argmaxρ:T→P(S2)minμ∈ΦU(π⊗αρ,μ)
Denote u∗:=minμ∈ΦU(π⊗ατ∗,μ). Consider any μ∈Φ and denote ξ:=pμ+(1−p)ν∗. We have
U(π⊗ατ∗,ξ)=pU(π⊗ατ∗,μ′)+(1−p)U(π⊗ατ∗,ν∗)
U(π⊗ατ∗,ξ)≥pu∗+(1−p)U(π⊗ατ∗,ν∗)
Applying Proposition A.6 to the right hand side
U(π⊗ατ∗,ξ)≥pu∗+(1−p)(Eπ×ν∗[u1]+ν∗(A)Et∼α∗π[Eτ∗(t)×ν∗[u2∣A]])
U(π⊗ατ∗,ξ)≥pu∗+(1−p)(Eπ×ν∗[u1]+ν∗(A)(Et∼α∗π[Eτ∗(t)×μ0[u2∣A]]−Γdtv(μ0∣A,ν∗∣A)))
Also, we have
U(π⊗ατ∗,μ0)≥u∗
Applying Proposition A.6 again
Eπ×μ0[u1]+μ0(A)Et∼α∗π[Eτ∗(t)×μ0[u2∣A]]≥u∗
Et∼α∗π[Eτ∗(t)×μ0[u2∣A]]≥u∗−Eπ×μ0[u1]μ0(A)
Combining the inequalities
U(π⊗ατ∗,ξ)≥pu∗+(1−p)(Eπ×ν∗[u1]+ν∗(A)(u∗−Eπ×μ0[u1]μ0(A)−Γdtv(μ0∣A,ν∗∣A)))
U(π⊗ατ∗,ξ)≥pu∗+(1−p)(Eπ×ν∗[u1]+ν∗(A)μ0(A)(u∗−Eπ×μ0[u1]−Γμ0(A)dtv(μ0∣A,ν∗∣A)))
U(π⊗ατ∗,ξ)≥(p+(1−p)ν∗(A)μ0(A))u∗+(1−p)(Eπ×ν∗[u1]−ν∗(A)μ0(A)(Eπ×μ0[u1]+Γμ0(A)dtv(μ0∣A,ν∗∣A)))
The above inequality holds for any ξ, therefore
minξ∈ΨU(π⊗ατ∗,ξ)≥(p+(1−p)ν∗(A)μ0(A))u∗+(1−p)(Eπ×ν∗[u1]−ν∗(A)μ0(A)(Eπ×μ0[u1]+Γμ0(A)dtv(μ0∣A,ν∗∣A)))
By definition of ρ∗, it follows that
minξ∈ΨU(π⊗αρ∗,ξ)≥(p+(1−p)ν∗(A)μ0(A))u∗+(1−p)(Eπ×ν∗[u1]−ν∗(A)μ0(A)(Eπ×μ0[u1]+Γμ0(A)dtv(μ0∣A,ν∗∣A)))
Denote ξ0:=pμ0+(1−p)ν∗ and u:=U(π⊗αρ∗,μ0). We have
U(π⊗αρ∗,ξ0)=pu+(1−p)U(π⊗αρ∗,ν∗)
Applying Proposition A.6 to the right hand side
U(π⊗αρ∗,ξ0)=pu+(1−p)(Eπ×ν∗[u1]+ν∗(A)Et∼α∗π[Eρ∗(t)×ν∗[u2∣A]])
U(π⊗αρ∗,ξ0)≤pu+(1−p)(Eπ×ν∗[u1]+ν∗(A)(Et∼α∗π[Eρ∗(t)×μ0[u2∣A]]+Γdtv(μ0∣A,ν∗∣A))
On the other hand, Proposition A.6 implies that
Eπ×μ0[u1]+μ0(A)Et∼α∗π[Eρ∗(t)×μ0[u2∣A]]=u
Et∼α∗π[Eρ∗(t)×μ0[u2∣A]]=u−Eπ×μ0[u1]μ0(A)
Substituting in the inequality, we get
U(π⊗αρ∗,ξ0)≤pu+(1−p)(Eπ×ν∗[u1]+ν∗(A)(u−Eπ×μ0[u1]μ0(A)+Γdtv(μ0∣A,ν∗∣A))
U(π⊗αρ∗,ξ0)≤pu+(1−p)(Eπ×ν∗[u1]+ν∗(A)μ0(A)(u−Eπ×μ0[u1]+Γμ0(A)dtv(μ0∣A,ν∗∣A))
U(π⊗αρ∗,ξ0)≤(p+(1−p)ν∗(A)μ0(A))u+(1−p)(Eπ×ν∗[u1]−ν∗(A)μ0(A)(Eπ×μ0[u1]−Γμ0(A)dtv(μ0∣A,ν∗∣A))
Observing that U(π⊗αρ∗,ξ0)≥minξ∈ΨU(π⊗αρ∗,ξ) we can combine the inequalities, we get
(<