Logical counterfactuals and differential privacy

4th Feb 2018

2Nisan

New Comment

1 comment, sorted by Click to highlight new comments since: Today at 8:36 AM

This doesn't quite work. The theorem and examples only work if you maximize the unconditional mutual information, , not . And the choice of is doing a lot of work — it's not enough to make it "sufficiently rich".

Edit: This article has major flaws. See my comment below.

This idea was informed by discussions with Abram Demski, Scott Garrabrant, and the MIRIchi discussion group.

## Summary

For a logical inductor P, define logical counterfactuals by

Pn(ϕ|ψ):=∑yPk(ϕ|ψ∧Y=y)Pn(Y=y)

for a suitable k<n and a random variable Y independent of ψ with respect to Pk. Using this definition, one can construct agents that perform well in ASP-like problems.

## Motivation

Recall the Agent Simulates Predictor problem:

Un=106Pn−1(An=1)+1031(An=2)

Naively, we want to solve this by argmaxing:

An=argmaxaEn[Un|An=a]

Hopefully, Pn(An=1)≈1, Pn−1(An=1)≈1, and En[Un|An=1]≈106. Also, two-boxing should be less attractive than one-boxing:

En[Un|An=2]≈103

However, if we make this well-defined with ε-exploration, we'll get

En[Un|An=2]≈106+103

and then the agent will two-box, contradiction. Instead we'd like to use predictable exploration and set

En[Un|An=2]:=Ek[Un|An=2]

for k small enough that the right-hand side is sensible. Let's see how.

## Predictable exploration

Choose k≪n so that Pk(An=2)≫0. Our agent decides whether to explore at stage k, and uses its beliefs at stage k as a substitute for counterfactuals:

explore0:=Pk(explore0)<εexplore1:={1Pk(explore1=1)<122otherwise∀aEn(ϕ|A=a):=Ek(ϕ|A=a∧explore0) if Pn(A=a)<δAn:={explore1if explore0argmaxaEn[Un|An=a]otherwise

Here ε,δ are small positive numbers. It's easy to see that, under reasonable assumptions, this agent 1-boxes on Agent Simulates Predictor. But it can't use the full strength of Pn in its counterfactual reasoning, and this is a problem.

## Differential privacy

To illustrate the problem, add a term to the utility function that sometimes rewards two-boxing:

Un=106Pn−1(An=1)+1031(An=2)+1061(An=2∧Xn−1)Xn−1:=Pn−1(Xn−1)<12

The agent should two-box if and only if X. Assuming that's the case, and Pn−1 knows this, we have:

Pn−1(An=1)=12¬Xn−1→En[Un|An=1]=12106¬Xn−1→En[Un|An=2]=Ek[Un|An=2∧explore0]=103+12106

So if ¬Xn−1, two-boxing is the more attractive option, which is a contradiction. (I'm rounding ε to zero for simplicity.)

The problem is that the counterfactual has to rely on Pk's imperfect knowledge of Xn−1. We want to combine Pk's ignorance of explore0 with Pn's knowledge of Xn−1.

If X is independent of A conditioned on explore0 with respect to Pk, then we can do this:

Ek[U|A=a∧explore0]=∑xEk[U|A=a∧explore0∧X=x]Pk(X=x|A=a∧explore0)=∑xEk[U|A=a∧explore0∧X=x]Pk(X=x|explore0)

Then replace Pk(X=x|explore0) with Pn(X=x|explore0):

En[U|A=a∧explore0]:=∑xEk[U|A=a∧explore0∧X=x]Pn(X=x|explore0)

This is more accurate than En[U|A=a∧explore0], and unbiased.

If X is not independent of A conditional on explore0, we can introduce an auxilliary variable and construct a version of X that is independent. This construction is a solution to the following differential privacy problem: Make a random variable Y that is a function of X and independent randomness, maximizing the mutual conditional information H(X;Y|A), subject to the constraint that A is independent of Y. Using the identity

H(X|A)=H(X;Y|A)+H(X|AY)

we see that the maximum is attained when H(X|AY)=0, which means that X is a function of A and Y.

Now here's the construction of Y:

Let X be the finite set of possible values of X, and let A be the finite set of possible values of A. We'll iteratively construct a set Y and define a random variable Y taking values in Y. To start with, let Y=∅.

Now choose

(a,x):=argmina∈Ax∈XP(X=x,Y∉Y|A=a)>0P(X=x,Y∉Y|A=a)

and for each a′∈A∖{a}, choose some f(a′)∈X such that P(X=f(a′),Y∉Y|A=a′)>0. Then make a random binary variable Ta′ such that

P(Ta′∧X=f(a′)∧Y∉Y|A=a′)=P(X=x∧Y∉Y|A=a)

Then let y be the event defined by

(X=x∧Y∉Y∧A=a)∨⋁a′∈A∖{a}(Ta′∧X=f(a′)∧Y∉Y∧A=a′)

and add y to Y. After repeating this process |X||A| times, we are done.

We can do this with a logical inductor as well. In general, to get a sentence T such that Pk(T∧B|C)≈p, take T:=Pk(T∧B|C)<p∧B∧C.

Now given random variables U and A, and some informative sentences ϕ1,…,ϕℓ, let X∈{T,F}ℓ be the random variable encoding the values of ϕ0,…,ϕℓ−1. The above construction works approximately and conditional on explore0 to give us a random variable Y that is approximately independent of A conditional on explore0 with respect to Pk. Now we define

En[U|A=a]:=∑yEk[U|A=a∧explore0∧Y=y]Pn(Y=y|explore0)

whenever Pn(A=a)<δ.

This succeeds on the problem at the beginning of this section: Assume An=2↔Xn−1, and assume that Pn−1 knows this. Then:

Pn−1(An=1)=12¬Xn−1→En[Un|An=1]=12106¬Xn−1→En[Un|An=2]=Ek[Un|An=2∧explore0∧¬Xn−1]=103Xn−1→En[Un|An=2]=12106+103+106Xn−1→En[Un|An=1]=Ek[Un|An=1∧explore0∧Xn−1=106

which does not lead to contradiction. In fact, there are agents like this that do at least as well as any constant agent:

## Theorem

Let Un(P,A) be a utility function defined with metasyntactic variables n, P, and A. It must be computable in polynomial time as a function of A, Pfi(n)(A=a), and X:=Pfi(n)(X)<p, where fi can be any polytime functions that doesn't grow too slowly and such that fi(n)<n. Then there exists a logical inductor P such that for every n, there exists k<n, ε,δ>0, and a pseudorandom variable Y such that the agent A defined below performs at least as well on Un as any constant agent, up to a margin of error that approaches 0 as n→∞:

explore0:=Pk(explore0)<εexplore1:=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩a1if Pk(explore1=a1)<1ℓ ; elsea2if Pk(explore1=a2)<1ℓ ; else⋮aℓotherwise∀aEn(ϕ|A=a):=∑yEk(ϕ|A=a∧explore0∧Y=y)Pn(Y=y|explore0) if Pn(A=a)<δAn:={explore1if explore0argmaxaEn[Un|An=a]otherwise

## Proof sketch

Choose k smaller than the strength parameter of the weakest predictor in Un. If an is the best constant policy for Un, assume An=an. Since Pn can compute Un, our agent's factual estimate En[Un|An=an] is accurate, and the counterfactual estimate En[Un|An=a′] for a′≠an is an accurate estimate of the utility assigned to the constant policy a′, as long as we make Y rich enough. So the agent will choose an. Thus we have an implication of the form "if P believes An=an, then An=an is true", and so we can create a logical inductor P that always believes that An=an for every n by adding a trader with a large budget that bids up the price of An=an.

## Isn't this just UDTv2?

This is much less general than UDTv2. If you like, you can think of this as an agent that at time k chooses a program to run, and then runs that program at time n, except the program always happens to be "argmax over this kind of counterfactual".

Also, it doesn't do policy selection.

## Next steps

Instead of handing the agent a pseudorandom variable Y that captures everything important, I'd like to have traders inside a logical inductor figure out what Y should be on their own.

Also, I'd rather not have to hand the agent an optimal value of k.

Also, I hope that these counterfactuals can be used to do policy selection and win at counterfactual mugging.