Existence of distributions that are expectation-reflective and know it

We prove the existence of a probability distribution over a theory $T$ with the property that for certain definable quantities $φ$ , the expectation of the value of a function $E [┌ φ ┐]$ is accurate, i.e. it equals the actual expectation of $φ$ ; and with the property that it assigns probability 1 to $E$ behaving this way. This may be useful for self-verification, by allowing an agent to satisfy a reflective consistency property and at the same time believe itself or similar agents to satisfy the same property. Thanks to Sam Eisenstat for listening to an earlier version of this proof, and pointing out a significant gap in the argument. The proof presented here has not been vetted yet.

Problem statement

Given a distribution $P$ coherent over a theory $A$ , and some real-valued function $f$ on completions of $A$ , we can define the expectation $E [f]$ of $f$ according to $P$ . Then we can relax the probabilistic reflection principle by asking that for some class of functions $f$ , we have that $E [┌ E [f] ┐] = E [f]$ , where $E$ is a symbol in the language of $A$ meant to represent $E$ . Note that this notion of expectation-reflection is weaker than probabilistic reflection, since our distribution is now permitted to, for example, assign a bunch of probability mass to over- and under-estimates of $E [f]$ , as long as they balance out.

Christiano asked whether it is possible to have a distribution that satisfies this reflection principle, and also assigns probability 1 to the statement that $E$ satisfies this reflection principle. This was not possible for strong probabilistic reflection, but it turns out to be possible for expectation reflection, for some choice of the functions $f$ .

Sketch of the approach

(This is a high level description of what we are doing, so many concepts will be left vague until later.)

Christiano et al. applied Kakutani’s theorem to the space of coherent $P$ . Instead we will work in the space of expectations over some theory $T$ , where an expectation over a theory is, roughly speaking, a function from the set of variables provably defined by that theory, into the intervals proved to bound each variable. These are essentially interchangeable with coherent probability distributions over $T$ . The point of doing this is to make the language simpler, for example reflection statements will mention a single symbol representing an expectation, rather than a complicated formula defining the expectation in terms of probability.

We will again apply Kakutani’s theorem, now requiring that some expectation $G$ reflects $F$ only when $G$ expects $E$ to behave like $F$ , and when $G$ assigns some significant probability to the statement that $E$ is reflective. This confidence in reflection must increase the closer that $F$ is to being reflective. Then a fixed point of this correspondence will be expectation-reflective, and will assign probability 1 to $E$ being expectation-reflective.

The form of our correspondence will make most of the conditions of Kakutani’s theorem straightforward. The main challenge will be to show non-emptiness, i.e. that there is some expectation that reflects a given $F$ and believes in reflection to some extent. In the case of probabilistic reflection, this does not go through at all, since if we reflect a non-reflective probability distribution exactly, we must assign probability 0 to reflection.

However, in the case of expectations, we can mix different expectations together while maintaining expectation-reflection, by carefully balancing the mixture. The main idea will be to take a distribution $G_{H}$ that believes in some reflective expectation $H$ , take another distribution $G_{J}$ that believes in some pseudo-expectation $J$ , and mix them together. The resulting mixture will somewhat expect $E$ to be reflective, since $G_{H}$ expects this, and by a good choice of $J$ counterbalancing $H$ , $G$ will expect $E$ to behave like $F$ .

Before carrying out this approach, we need some formal notions and facts about expectations, given in Sections 3 and 4. Also, in order to be careful about what we mean by an expectation, a pseudo-expectation, and a variable, we will in Section 5 develop a base theory $T$ over which our distributions will be defined. Then Section 6 will give the main theorem, following the above sketch. Section 7 discusses the meaning of these results and extensions to definable reflection.

Basic definitions and facts about expectations

We will work with probability distributions (or, in a moment, expectations) that are coherent over some base theory $T$ in a language that can talk about rationals, functions, and has a symbol $E$ .

Random variables for theories and their bounds

These notions are due to Fallenstein.

We are interested in taking expectations of quantities expressed in the language of $T$ . This amounts to viewing a probability distribution $P$ coherent over $T$ as a measure on the Stone space $S_{T}$ , and then asking for the expectation

$E [f] := \int_{S_{T}} f d P .$

A natural choice for the kind of random variable $f$ to look at is those values definable over $T$ , i.e. formulas $φ (x)$ such that $T ⊢ \exists! x \in R : φ (x)$ . Then any completion of $T$ will make statements of the form $\forall r \in R : φ (r) > a$ for various $a \in Q$ in a way consistent with $φ$ holding on a unique real, and perhaps we can extract a value for the random variable $φ$ .

However, we have to be a little careful. If this is all that $T$ proves about $φ$ , then there will be completions of $T$ which, for every $a \in Q$ , contain the statement $\forall r \in R : φ (r) > a$ . Then there is no real number reasonably corresponding to $φ$ . Even if this is not an issue, there are distributions which assign non-negligible probabilities to a sequence of completions of $T$ that put quickly growing values on $φ$ , such that the integral $E [φ]$ does not exist.

Therefore we also require that $T$ proves some concrete bounds on the real numbers that can satisfy $φ (x)$ . Then we will be able to extract values for $φ$ from completions of $T$ and define the expectation of $φ$ according to $P$ .

Definition

[Definition of bounded variables $Var (A)$ for $A$ .]

For any consistent theory $A$ , the set $Var (A)$ is the set of formulas $φ (x)$ such that $A$ proves $φ (x)$ is well-defined and in some particular bounds, i.e.:

$φ \in Var (A) \Leftrightarrow \exists a, b \in Q : A ⊢ [\exists! x \in R : φ (x)] \land [\forall x \in R : φ (x) \to x \in [a, b]] .$

Elements of $Var (A)$ are called $A$ -variables. $=:$

Definition

[Definition of $A$ -bounds on variables.]

For $φ \in Var (A)$ , let $[a, b]_{A, φ}$ be the complete bound put on $φ$ by $A$ , taking into account all bounds on $φ$ proved by $A$ , i.e.

$[a, b]_{A, φ} := ⋂ {[s, t] ∣ s, t \in Q, A ⊢ φ \in [s, t]} .$ $=:$

Note that the $A$ -bound $[a, b]_{A, φ}$ on a variable $φ \in Var (A)$ is a well-defined non-empty closed interval; it is the intersection of non-disjoint closed intervals all contained in some rational interval, by the definition of $Var (A)$ and the fact that $A$ is consistent.

Expectations and pseudo-expectations over a theory

The definition of $Exp (A)$ and the theorem in Section 4 are based on a comment of Eisenstat.

Now we define expectations over a theory, analogously to probability distributions. Here linearity will play the role of coherence.

Definition

[Sum of two $A$ -variables.] For $φ, ψ \in Var (A)$ , we write $φ + ψ$ for the sum of the two variables, i.e.:

$(φ + ψ) (x) \Leftrightarrow \exists q, r \in R : x = q + r \land φ (q) \land ψ (r) .$

Then $φ + ψ \in Var (A)$ for reasonable $A$ . $=:$

Definition

[Expectations $Exp (A)$ over a theory $A$ .] An expectation over a theory $A$ is a function $E : Var (A) \to R$ such that for all $φ, ψ \in Var (A)$ :

(In $A$ -bounds) $E [φ] \in [a, b]_{A, φ}$ , i.e. $E$ takes values in the bounds proved by $A$ , and
(Linear) $E [φ + ψ] = E [φ] + E [ψ]$ .

$=:$

In order to carry out the counterbalancing argument described above, we need some rather extreme affine combinations of expectations. So extreme that they will not even be proper expectations; so we define pseudo-expectations analogously to expectations but with much looser bounds on their values.

Definition

[Pseudo-expectations $PseudoExp (A)$ over a theory $A$ .] A pseudo-expectation over a theory $A$ is a function $E : Var (A) \to R$ such that for all $φ, ψ \in Var (A)$ :

(Loosely in $A$ -bounds) If $φ$ has $A$ -bound $[a, b]_{A, φ}$ , we have that $E [φ] \in [a - (b - a) (2^{┌ φ ┐}), b + (b - a) (2^{┌ φ ┐})]$ , and
(Linear) $E [φ + ψ] = E [φ] + E [ψ]$ .

$=:$

For any theories $A \subset B$ , we have that $Exp (B) \subset Exp (A) \subset PseudoExp (A)$ and $Exp (B) \subset PseudoExp (B) \subset PseudoExp (A)$ . We are implicitly restricting elements of $Exp (B)$ and $PseudoExp (B)$ to $Var (A)$ in these comparisons, and will do so freely in what follows. We take the product topology on both $Exp (A)$ and $PseudoExp (A)$ .

Isomorphism of expectations and probability distributions

To actually construct elements of $Exp (A)$ , we will use a natural relationship between probability distributions $P$ and expectations $E$ over a theory, proved formally below to be an isomorphism. On the one hand we can get a probability distribution from $E$ by taking the expectation of indicator variables for the truth of sentences; on the other hand we can get an expectation from a probability distribution by integrating a variable over the Stone space of our theory.

Definition

[The value of an $A$ -variable.] For a complete theory $A$ , the value $A (φ)$ of some $A$ -variable $φ$ is $sup {q \in Q ∣ A ⊢ \forall x : φ (x) \to x > q}$ . Since $φ \in Var (A)$ , this value is well-defined, and $A (φ) \in [a, b]_{A, φ}$ . $=:$

Theorem

For any theory $A$ , there is a canonical isomorphism $ι$ between $Exp (A)$ and the space of coherent probability distributions over $A$ , given by:

$ι : Exp (A) \to Δ (A)$

$ι (E) (θ) := E [Ind (θ)],$

where $Ind (θ)$ is the 0-1 valued indicator variable for the sentence $θ$ , i.e. $Ind (θ) := (x = 0 \land \neg θ) \lor (x = 1 \land θ)$ . The alleged inverse $ι^{- 1}$ is given by:

$ι^{- 1} (P) [φ (x)] := \int_{A^{'} \in S_{A}} A^{'} (φ (x)) d P .$

Proof. By the previous discussion, $ι$ and $ι^{- 1}$ are well-defined in the sense that they return functions of the correct type.

$ι^{- 1} (P) \in Exp (A)$

By definition of $Var (A)$ , the integrals in the definition of $ι^{- 1} (P)$ are defined and within $A$ -bounds. For any $φ, ψ \in Var (A)$ and any $a, b \in Q$ , we have that $A ⊢ (\forall x : φ (x) \to x > a) \land (\forall y : ψ (y) \to y > b) \to (\forall z : (φ + ψ) (z) \to z > a + b)$ and

$A ⊢ (\forall x : (φ + ψ) (x) \to x > a) \to \exists b, c : (\forall y : φ (y) \to y > b) \land (\forall z : ψ (z) \to z > c) .$

Thus $ι^{- 1} (P) [φ + ψ] = ι^{- 1} (P) [φ] + ι^{- 1} (P) [ψ]$ , so $ι^{- 1} (P)$ is linear and hence is an expectation.

$ι (E) \in Δ (A)$

For any $θ \in A$ , we have that $A ⊢ Ind (θ) = 1$ , so since $E$ is in $A$ -bounds, $ι (E) (θ) = E [Ind (θ)] = 1$ . Similarly, for any partition of truth into three sentences, $A$ proves the indicators of those sentences have values summing to 1; so $E$ assigns values to their indicators summing to 1, using linearity a few times and the fact that $E$ assigns the same value to variables with $A ⊢ \forall x : φ (x) \leftrightarrow ψ (x)$ .

This last fact follows by considering the $A$ -bound of $[0, 0]$ on the variable $φ (x) + (- ψ (x))$ . Linearity gives that $0 = E [φ (x) + (- ψ (x))] = E [φ (x)] + E [- ψ (x)]$ , so $E [φ (x)] = - E [- ψ (x)]$ . If $φ \equiv ψ$ this gives $E [ψ (x)] = - E [- ψ (x)]$ , so that in general $E [φ (x)] = E [ψ (x)]$ , as desired.

$ι \circ ι^{- 1}$ is identity

For any $P \in Δ (A)$ and any sentence $θ$ , we have

$ι \circ ι^{- 1} (P) (θ) = ι^{- 1} (P) [Ind (θ)] = \int_{A^{'} \in S_{A}} A^{'} (Ind (θ)) d P = P (θ),$

since any completion of $A$ with $A ⊢ θ$ also has $A ⊢ Ind (θ) = 1$ , and any completion of $A$ with $A ⊢ \neg θ$ also has $A ⊢ Ind (θ) = 0$ .

$ι$ is continuous

Take a $θ$ sub-basis open subset of $Δ (A)$ , the set of distributions assigning probability in $(a, b)$ to $θ$ . The preimage of this set is the set of expectations with $E [Ind (θ)] \in (a, b)$ , which is an open subset of $Exp (A)$ .

$ι^{- 1} \circ ι$ is identity

Take any $E \in Exp (A)$ . We want to show that

$E [φ (x)] = \int_{A^{'} \in S_{A}} A^{'} (φ (x)) d (ι E)$

for all $φ (x) \in Var (A)$ . In the following we will repeatedly apply linearity and the fact shown above that $E$ respects provable equivalence of variables. Take such a $φ (x)$ and assume for clarity that the $A$ -bound of $φ (x)$ is $[0, 1]$ . Then for any $n \in N$ , we have that

$E [φ] = \sum k \in [n] E [φ Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))]$ $E [φ] = \sum k \in [n] (\frac{k}{n}) E [Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))] + E [(φ - \frac{k}{n}) Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))] .$

Note that the last interval in these sums is closed instead of half-open. Since $A$ proves that $(φ - \frac{k}{n}) Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))$ is non-negative,

$E [φ] \geq \sum k \in [n] (\frac{k}{n}) E [Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))] .$

By the arguments given earlier, $E [Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))] = \int_{A^{'} \in S_{A}} A^{'} (Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))) d (ι E) .$

Hence $E [φ] \geq \sum k \in [n] (\frac{k}{n}) \int_{A^{'} \in S_{A}} A^{'} (Ind (φ \in [\frac{k}{n}, \frac{k + 1}{n}))) d (ι E)$ $E [φ] \geq \sum k \in [n] (\frac{k}{n}) ι E (φ \in [\frac{k}{n}, \frac{k + 1}{n})) .$ As $n \to \infty$ , the right is the definition of the Lebesgue integral of $φ$ . Combining this with a similar argument giving an upper bound on $E [φ]$ , we have that

$E [φ (x)] = \int_{A^{'} \in S_{A}} A^{'} (φ (x)) d (ι E)$ as desired.

$ι^{- 1}$ is continuous

Take a $φ (x)$ sub-basis open set in $Exp (A)$ , the set of expectations assigning a value in $(a, b)$ to $φ$ . Let $P$ be a probability distribution with $ι^{- 1} (P) [φ] \in (a, b)$ . As in the previous section of the proof, we can cut up the bound $[c, d]_{A, φ}$ into finitely many very small intervals. Then any probability distribution that assigns probabilities sufficiently close to those assigned by $P$ to the indicators for $φ$ being in those small intervals, will have an expectation for $φ$ that is also inside $(a, b)$ . This works out to an open set around $P$ , so that the preimage of the $φ (x)$ sub-basis open set is a union of open sets. $⊣$

A base theory that accommodates reflection variables

So, we have a dictionary between distributions and expectations. This will let us build expectations by completing theories and taking expectations according to the resulting 0-1 valued distribution.

Some preparatory work remains, because in order to have the reflection principle $E [E [┌ φ ┐]] = E [φ]$ , we at least want $E [┌ φ ┐]$ to be a variable whenever $φ$ is. Thus we will need a theory $T$ that bounds $E [┌ φ ┐]$ whenever it bounds $φ$ . However, in order to make extreme mixes of elements of $Exp (T)$ possible to reflect into an expectation over $T$ , we will need that all elements of $PseudoExp (T)$ are valid interpretations of $E$ for $T$ .

Stratified definition of the base theory $T$

We start with a theory such as $Z F C$ that is strong enough to talk about rational numbers and so on. We add to the language a symbol $E$ that will represent an expectation. We also add the sentence stating that $E$ is a partial function from $N$ to $R$ , and that $E$ is linear at $φ + ψ$ if it happens to be defined on $φ, ψ,$ and $φ + ψ$ . This gives the theory $T_{0}$ .

Now define inductively the theories $T_{n + 1} \supset T_{n}$ : $\begin{matrix} T_{n + 1} := & T_{n} + \forall ┌ φ ┐, k \in N : \forall a, b \in Q : [k witnesses T_{n} ⊢ (\exists! x \in R : φ (x)) \land (\forall x : φ (x) \to x \in [a, b])] \to (\exists! x \in R : E [┌ φ ┐] = x) \land (\forall x : E [┌ φ ┐] = x \to x \in [a - (b - a) (2^{┌ φ ┐}), b + (b - a) (2^{┌ φ ┐})]) \end{matrix}$

In English, this says that $T_{n + 1}$ is $T_{n}$ along with the statement that whenever $T_{n}$ proves that some $φ$ is well-defined and bounded in some interval $[a, b]$ , then it is the case that $E$ is defined on $φ$ and $E [┌ φ ┐]$ is inside the much looser bound $[a - (b - a) (2^{┌ φ ┐}), b + (b - a) (2^{┌ φ ┐})]$ . Intuitively we are adding into $Var (T_{n + 1})$ the variable $E [┌ φ ┐]$ whenever $φ \in Var (T_{n})$ , but we are not restricting its value very much at all. The form of the loose bound on $E [┌ φ ┐]$ is an artifact of the metric we will later put on $Exp (T)$ .

Finally, we define the base theory we will use in the main argument as the limit of the $T_{n}$ , that is: $T := ⋃_{n \in N} T_{n}$ . Note that $T$ is at least (exactly?) as strong as $(T_{0})_{ω}$ , the theory $T_{0}$ with $ω$ -iterated consistency statements, since the loose bounds are the same as the true bounds when the true bound is $[a, a]$ . Also note that it is important that $T_{0}$ is arithmetically sound, or else $T$ may believe in nonstandard proofs and hence put inconsistent bounds on $E$ . I think this restriction could be avoided by making the statement in $T_{n + 1} - T_{n}$ into a schema over specific standard naturals that might be proofs.

Soundness of $T$ over $PseudoExp (T)$

We will be applying Kakutani’s theorem to the space $Exp (T)$ , and making forays into $PseudoExp (T)$ . So we want $T$ to at least be consistent, so that $Exp (T)$ is nonempty, and furthermore we want $T$ to allow for $E$ to be interpreted by anything in $PseudoExp (T)$ .

Recall that a (pseudo)expectation over a theory $A$ is a function $E : Var (A) \to R$ that is linear, and such that given $φ$ with $A$ -bound $[a, b]_{A, φ}$ , we have that $E [φ] \in [a, b]$ (or $E [φ] \in [a - (b - a) (2^{┌ φ ┐}), b + (b - a) (2^{┌ φ ┐})]$ ). As noted before, for any theories $A \subset B$ , we have that $Exp (B) \subset Exp (A) \subset PseudoExp (A)$ and $Exp (B) \subset PseudoExp (B) \subset PseudoExp (A)$ , where we are restricting elements of $Exp (B)$ and $PseudoExp (B)$ to $Var (A)$ .

Lemma

For any consistent theory $A$ , $Exp (A)$ is nonempty.

This follows from the isomorphism $ι^{- 1}$ ; we take a completion of $A$ , which is a coherent probability distribution $P$ over $A$ , and then take expectations according to $P$ . That is, $ι^{- 1} (P) \in Exp (A)$ . $⊣$

We assume that we have some standard model for the theory over which $T$ was constructed. For concreteness we take that theory to be $Z F C$ , and we take the standard model to be the cumulative hierarchy $V$ .

Theorem

$Exp (T)$ is nonempty, and for all $J \in PseudoExp (T)$ , we have that $(V, J) ⊨ T$ .

(To follow the proof, keep in mind the distinction between $E$ being a (pseudo)expectation over a theory, versus $E$ providing a model for a theory.)

Proof. The claim is true for $T_{0}$ in place of $T$ , since $T_{0}$ is consistent and places no restrictions other than linearity on $E$ .

Say the claim holds for $T_{n}$ , so $PseudoExp (T_{n})$ is non-empty. For any $J \in PseudoExp (T_{n})$ , by hypothesis $(V, J) ⊨ T_{n}$ . Also, by definition of $PseudoExp (T_{n})$ , $J$ satisfies that whenever $T_{n}$ bounds $φ$ in $[a, b]$ , also $J [φ] \in [a - (b - a) (2^{┌ φ ┐}), b + (b - a) (2^{┌ φ ┐})]$ . Hence $(V, J) ⊨ T_{n + 1}$ . Thus $T_{n + 1}$ is consistent. Since $PseudoExp (T_{n + 1}) \subset PseudoExp (T_{n})$ , this also shows that for all $J \in PseudoExp (T_{n + 1})$ , we have $(V, J) ⊨ T_{n + 1}$ .

By induction the claim holds for all $n$ , and hence $T$ is consistent and $Exp (T)$ is nonempty. Since $PseudoExp (T) \subset PseudoExp (T_{n})$ for all $n$ , for any $J \in PseudoExp (T)$ we have $(V, J) ⊨ T_{n}$ , and hence $(V, J) ⊨ T$ . $⊣$

Main theorem: reflection and assigning probability 1 to reflection

We have a theory $T$ that is consistent, so that $Exp (T)$ is nonempty, and sound over all pseudo-expectations. We want an expectation that is reflective, and also believes that it is reflective. First we formalize this notion and show that there are reflective expectations.

Existence of reflective expectations

Define the sentence $refl := \forall n \in N : (E [n] defined) \to (E [┌ E [n] ┐] defined, and E [┌ E [n] ┐] = E [n]) .$

This says that whenever $E$ is defined on some variable, it expects $E$ to take some value on that variable, and it expects the correct value. In short, its expectations about its expectations are correct. Define $Refl (T) \subset Exp (T)$ to be the reflective expectations over $T$ , i.e. those that satisfy $refl$ .

Some observations: the spaces $Refl (T) \subset Exp (T) \subset PseudoExp (T) \subset {[a - (b - a) (2^{┌ φ ┐}), b + (b - a) (2^{┌ φ ┐})]}_{T, φ}^{Var (T)}$ are all compact, as they are closed subsets of the product of the loose bounds on $Var (T)$ , that product being a compact space. Both $Exp (T)$ and $PseudoExp (T)$ are convex, as linearity and being in bounds are preserved by convex combinations. (For the same reason, $Refl (T)$ is convex, and is in fact an affine subspace of $Exp (T)$ .)

Lemma

$Refl (T)$ is nonempty.

Proof. We apply Kakutani’s theorem to $Exp (T)$ where $G$ corresponds to $F$ when $\forall φ \in Var (T) : G [E [┌ φ ┐]] = F [φ]$ . The set of $G$ corresponding to $F$ is compact and convex, and the graph is closed. For any $F$ there is a corresponding $G$ : we take an expectation over the theory

$T_{F} := T + {E [┌ φ ┐] \in (a, b) ∣ a, b \in Q, F [φ] \in (a, b)}$

stating that $E$ behaves according to $F$ . This theory $T_{F}$ is consistent because $F$ provides a model. Any completion $T_{F}^{'}$ has $T_{F}^{'} (E [┌ φ ┐]) = F [φ]$ , so the resulting expectation corresponds to $F$ . Kakutani’s theorem gives a fixed point of this correspondence, which is in $Refl$ . $⊣$

The correspondence $⊲_{E}$ : exact reflection and assigning high probability to reflection for distributions close to reflective

We can’t simply take a correspondence $⊲_{E}$ that also requires $G$ to assign probability 1 to $refl$ ; in general there would not be any expectation corresponding to any $F \in Exp (T) - Refl (T)$ . Instead we will soften this requirement, and only require that $G [refl]$ approach 1 as $F$ approaches being reflective, in order for $F ⊲_{E} G$ .

Definition

Define a metric on $Exp (T)$ by

$d (F, G) := \sum φ \in Var (T) \frac{| F [φ] - G [φ] |}{2^{┌ φ ┐} | [a, b]_{T, φ} |} .$

(If $| [a, b]_{T, φ} | = 0$ then the $φ$ coordinate plays no role in the metric by fiat.) $=:$

The factor of $1 / 2^{┌ φ ┐}$ ensures that the metric will converge, since the factor of $1 / | [a, b]_{T, φ} |$ corrects the projection of $Exp (T)$ in each coordinate to be $[0, 1]$ .

We abbreviate $d ⟨ F ⟩ := d (F, Refl) = {min}_{H \in Refl} d (F, H)$ to mean the distance from $F$ to the nearest element of the set $Refl$ . Since $Refl$ is compact, this is well-defined and continuous on $Exp (T)$ .

Definition

For $F, G \in Exp (T)$ , we say that $G$ reflects $F$ and we write $F ⊲_{E} G$ precisely when:

$G$ expects $E$ to behave just like $F$ , i.e. $\forall φ \in Var (T) : G [E [┌ φ ┐]] = F [φ]$ , and
$G$ is somewhat confident that $E$ is reflective, specifically $G [refl] \geq 1 - d ⟨ F ⟩$ .

$=:$

Fixed points of the correspondence are reflective and believe they are reflective

Say $G ⊲_{E} G$ . Then $G \in Refl (T)$ , by definition of $⊲_{E}$ . In particular, $d ⟨ G ⟩ = 0$ , so that $G [refl] = 1$ , and $G$ the desired distribution.

Compact and convex images; closed graph

For a fixed $F$ , the conditions for $F ⊲_{E} G$ are just closed subintervals in some coordinates, so ${G ∣ F ⊲_{E} G}$ is compact and convex.

Consider a sequence $F_{0} ⊲_{E} G_{0}, F_{1} ⊲_{E} G_{1}, \dots,$ converging to $F$ and $G$ . For $φ \in Var (T)$ , since $G_{n} [E [┌ φ ┐]] = F_{n} [φ] \to F [φ]$ , we have $G_{n} [E [┌ φ ┐]] \to G [E [┌ φ ┐]] = F [φ]$ . Also, since $d ⟨ F_{n} ⟩ \to d ⟨ F ⟩$ , we have that the $G_{n} [refl] \geq 1 - d ⟨ F_{n} ⟩$ converge to something at least $1 - d ⟨ F ⟩$ , so $G [refl] \geq 1 - d ⟨ F ⟩$ . Thus $⊲_{E} \subset Exp (T) \times Exp (T)$ is closed.

Images of the correspondence are nonempty: interpolating reflective and pseudo-expectations

Finally, we need to show that for any $F \in Exp (T)$ , there is some $G \in Exp (T_{0})$ such that $F ⊲_{E} G$ . (The case distinction is just for explanatory purposes.)

Case 1. $F \in Refl (T)$ .

Recall the theory $T_{F} := T + {E [┌ φ ┐] \in (a, b) ∣ a, b \in Q, F [φ] \in (a, b)}$ stating that $E$ behaves according to $F$ . By the theorem about $T$ , $(V, F) ⊨ T$ , so along with $F \in Refl (T)$ we also have $(V, F) ⊨ T_{F} + refl$ . Thus that theory is consistent, so we can take some $G \in Exp (T_{F} + refl)$ . This $G$ expects $E$ to behave like $F$ , and $G [refl] = 1 \geq 1 - d ⟨ F ⟩ = 1$ .

Case 2. $F \notin Refl (T)$ .

Pick some $H \in Refl (T)$ with $d (F, H) = d ⟨ F ⟩ > 0$ . As in the previous case, find some $G_{H} \in Exp (T_{H} + refl)$ , so $G_{H}$ expects $E$ to behave like $H$ , and $G_{H} [refl] = 1$ . We will define $G$ with $F ⊲_{E} G$ by taking a convex combination of $G_{H}$ with another $G_{J} \in Exp (T)$ :

$G := (1 - d ⟨ F ⟩) G_{H} + d ⟨ F ⟩ G_{J} .$

By convexity, $G \in Exp (T)$ , and since $G_{J} [refl] \in [0, 1]$ , we will have $G [refl] \geq (1 - d ⟨ F ⟩)$ as desired.

However, we also need $G [E [┌ φ ┐]] = F [φ]$ . That is, we need $\begin{matrix} ((1 - d ⟨ F ⟩) G_{H} + d ⟨ F ⟩ G_{J}) [E [┌ φ ┐]] & = F [φ] G_{J} [E [┌ φ ┐]] & = \frac{F [φ] - (1 - d ⟨ F ⟩) G_{H} [E [┌ φ ┐]]}{d ⟨ F ⟩} J [φ] & := \frac{1}{d ⟨ F ⟩} F [φ] + (1 - \frac{1}{d ⟨ F ⟩}) H [φ], \end{matrix}$

where $G_{J}$ believes that $E$ behaves like $J$ . We take the last line to be the definition of $J$ .

In general, this function $J$ is not in $Exp (T)$ . It may be that $d (F, H)$ is very small, but for some large $φ$ , $F [φ]$ is large and $H [φ]$ is small, so that $J [φ]$ is very large and actually outside of $[a, b]_{T, φ}$ , and hence not an expectation. However, $J$ is, in fact, a pseudo-expectation over $T$ :

$J [φ] = H [φ] + \frac{1}{d ⟨ F ⟩} (F [φ] - H [φ])$ $J [φ] \in [a - K, b + K],$ where $H [φ] \in [a, b]_{T, φ}$ , and $K := \frac{1}{d ⟨ F ⟩} (| F [φ] - H [φ] |)$ . That is, the claim is that $K \leq (b - a) (2^{┌ φ ┐})$ . Indeed:

$\begin{matrix} K & = \frac{1}{d ⟨ F ⟩} (| F [φ] - H [φ] |) = \frac{| F [φ] - H [φ] |}{d (F, H)} = \frac{| F [φ] - H [φ] |}{\sum_{ψ \in Var (T)} \frac{| F [ψ] - H [ψ] |}{2^{┌ ψ ┐} | [a, b]_{T, ψ} |}} \leq \frac{| F [φ] - H [φ] |}{\frac{| F [φ] - H [φ] |}{2^{┌ φ ┐} | [a, b]_{T, φ} |}} = 2^{┌ φ ┐} | [a, b]_{T, φ} | = (b - a) (2^{┌ φ ┐}) . \end{matrix}$

Therefore $J \in PseudoExp (T)$ . By the theorem on $T$ , $(V, J) ⊨ T$ , so that $T_{J}$ is consistent and we obtain $G_{J} \in Exp (T)$ that expects $E$ to behave like $J$ . Then $G = (1 - d ⟨ F ⟩) G_{H} + d ⟨ F ⟩ G_{J}$ is in $Exp (T)$ , expects $E$ to behave like $F$ , and has $G [refl] \geq (1 - d ⟨ F ⟩)$ . That is, $F ⊲_{E} G$ .

The conditions of Kakutani’s theorem are satisfied, so there is a fixed point $E ⊲_{E} E$ , and therefore we have an expectation that believes $E$ behaves like itself, and that assigns probability 1 to $E$ having this property. $⊣$

Extension to belief in any generic facts about $Refl$

The above argument goes through in exactly the same way for any statement $θ$ that is satisfied by all reflective expectations; we just have $G_{H}$ also assign probability 1 to $θ$ , and modify $⊲_{E}$ by adding a condition for $θ$ analogous to that for $refl$ . For example, we can have our reflective $E$ assign probability 1 to $E \in Exp (T)$ , which is analogous to an inner coherence principle.

Discussion

I think that if the base theory is strong enough to prove $Exp (T) ≅ Δ (T)$ , then this whole argument can be carried out with $E$ defined in terms of $P$ , a symbol for a probability distribution, and so we get a probability distribution over the original language with the desired beliefs about itself as a probability distribution.

I think it should be possible to have a distribution that is reflective in the sense of $⊲_{E}$ be definable and reflective for its definition, using the methods from this post. But it doesn’t seem as straightforward here. One strategy might be to turn the sentence in the definition of $T_{n + 1}$ , stating that $E$ is in the loose $T_{n}$ -bounds on variables, into a schema, and diagonalizing at once against all the $T_{n}$ refuting finite behaviors. But, the proof of soundness of $T$ over pseudo-expectations, and diagonalizing also against refuting finite behaviors in conjunction with $refl$ , seems to require a little more work (and may be false).

It would be nice to have a good theory of logical probability. The existence proof of an expectation-reflective distribution given here shows that expectation-reflection is a desideratum that might be achievable in a broader context (i.e. in conjunction with other desiderata).

I don’t know what class of variables a $⊲_{E}$ -reflective $E$ is reflective for. Universes that use $E$ in a way that only looks at $E$ ’s opinions on variables in $Var (T_{n})$ for some $n$ , and are defined and uniformly bounded whenever $E$ is in $PseudoExp (T_{n})$ , will be reflected accurately. If the universe looks at all of $E$ , and for instance does something crazy if $E$ is not in $Exp (T)$ , then $T$ may not be able to prove

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

4

Existence of distributions that are expectation-reflective and know it

4

Random variables for theories and their bounds

Definition

Definition

Expectations and pseudo-expectations over a theory

Definition

Definition

Definition

Definition

Theorem

$ι^{- 1} (P) \in Exp (A)$

$ι (E) \in Δ (A)$

$ι \circ ι^{- 1}$ is identity

$ι$ is continuous

$ι^{- 1} \circ ι$ is identity

$ι^{- 1}$ is continuous

Stratified definition of the base theory $T$

Soundness of $T$ over $PseudoExp (T)$

Lemma

Theorem

Existence of reflective expectations

Lemma

The correspondence $⊲_{E}$ : exact reflection and assigning high probability to reflection for distributions close to reflective

Definition

Definition

Fixed points of the correspondence are reflective and believe they are reflective

Compact and convex images; closed graph

Images of the correspondence are nonempty: interpolating reflective and pseudo-expectations

Case 1. $F \in Refl (T)$ .

Case 2. $F \notin Refl (T)$ .

Extension to belief in any generic facts about $Refl$

4

Existence of distributions that are expectation-reflective and know it

4

Random variables for theories and their bounds

Definition

Definition

Expectations and pseudo-expectations over a theory

Definition

Definition

Definition

Definition

Theorem

ι−1(P)∈Exp(A)

ι(E)∈Δ(A)

ι∘ι−1 is identity

ι is continuous

ι−1∘ι is identity

ι−1 is continuous

Stratified definition of the base theory T

Soundness of T over PseudoExp(T)

Lemma

Theorem

Existence of reflective expectations

Lemma

The correspondence ⊲E: exact reflection and assigning high probability to reflection for distributions close to reflective

Definition

Definition

Fixed points of the correspondence are reflective and believe they are reflective

Compact and convex images; closed graph

Images of the correspondence are nonempty: interpolating reflective and pseudo-expectations

Case 1. F∈Refl(T).

Case 2. F∉Refl(T).

Extension to belief in any generic facts about Refl

$ι^{- 1} (P) \in Exp (A)$

$ι (E) \in Δ (A)$

$ι \circ ι^{- 1}$ is identity

$ι$ is continuous

$ι^{- 1} \circ ι$ is identity

$ι^{- 1}$ is continuous

Stratified definition of the base theory $T$

Soundness of $T$ over $PseudoExp (T)$

The correspondence $⊲_{E}$ : exact reflection and assigning high probability to reflection for distributions close to reflective

Case 1. $F \in Refl (T)$ .

Case 2. $F \notin Refl (T)$ .

Extension to belief in any generic facts about $Refl$