Multibit reflective oracles

AI ALIGNMENT FORUM
AF

Multibit reflective oracles — AI Alignment Forum

This post describes a new version of the reflective oracles I described in a previous forum post. This work extends probabilistic Turing machines with an oracle that answers certain kinds of queries about other probabilistic Turing machines with the same kind of oracle; straight-forward ways of doing this would lead to diagonalization problems, which this kind of oracle avoids by answering certain queries probabilistically. Applications include a variant of AIXI that is able to reason about universes containing other instances of the same variant of AIXI; I expect we'll have forum posts about this in the coming weeks.

The new version, which is based on this comment thread with Paul, is significantly simpler to use than the previous version: The previous version only gave useful answers about machines that halt with probability one, no matter how the oracle behaves. The new version basically just doesn't have an analogous requirement. In addition, the new version allows for queries about machines that output more than a single bit; it turns out that this makes it much simpler to define simplicity priors of the kind used in AIXI, doing away with most of the complexity I described in "Simplicity priors with reflective oracles".

This is still a fairly technical post, whose purpose is to help with figuring out the details of a nice version reflective oracles and of their applications. At some point in the future when all of this has stabilized more, I expect to write a more introductory post on these topics.

ETA: Thanks to Matt Elder, who helped me think through the details of this system! I wrote the original post pretty quickly and forgot to acknowledge this, mea culpa.

Let's write $B := {0, 1}$ for the set of bits, $B^{< ω}$ for the set of finite bitstrings, and $B^{ω}$ for the set of infinite streams of bits; further, let's write $ε$ for the empty bitstring. Then $Δ (B)$ is the set of probability distributions over single bits, which we can represent by a single number in $[0, 1]$ , specifying the probability that the bit is $1$ ; that is, we can think of $Δ (B)$ as the set $[0, 1]$ . Moreover, $Δ (B^{ω})$ is the set of probability distributions over infinite bitstreams, which we can represent by functions $P : B^{< ω} \to [0, 1]$ , where $P (¯ ¯ ¯ x)$ represents the probability that the infinite bitstream starts with the finite bitstring $¯ ¯ ¯ x$ , such that $P (ε) = 1$ (every bitstream starts with the empty bitstring), and $P (¯ ¯ ¯ x) = P (¯ ¯ ¯ x 0) + P (¯ ¯ ¯ x 1)$ , $\forall ¯ ¯ ¯ x \in B^{< ω}$ . In other words, we can think of $Δ (B^{ω})$ as the set of all functions $P : B^{< ω} \to [0, 1]$ which satisfy these two conditions.

Let's also write $P := Q \cap [0, 1]$ for the set of probabilities that are rational numbers, and $M \subseteq N$ for the set of all Gödel numbers of probabilistic Turing machines that can make calls to a probabilistic oracle (i.e., to an oracle that may answer some queries probabilistically). We will consider machines with advance-only binary output tapes---that is, the machine has instructions to write a $0$ or a $1$ on its output tape, and no way of going back and deleting something it has written; thus, a machine's output is either an element of $B^{< ω}$ (if the machine halts at some point, or runs forever but stops producing output after a finite amount of time), or an element of $B^{ω}$ (if the machine keeps producing output).

A query to the oracle is a triple $(┌ M ┐, ¯ ¯ ¯ x, q) \in (M \times B^{< ω} \times P)$ , roughly interpreted as asking whether the probability that the output of the oracle machine $M$ starts with $¯ ¯ ¯ x$ is greater than $q$ . The output of the oracle is a single bit, which, on some queries, may be chosen randomly: If $M$ definitely produces at least $l e n (¯ ¯ ¯ x)$ bits of output, and $q$ is greater than the probability that the output starts with $¯ ¯ ¯ x$ , the oracle definitely outputs $1$ ; if $q$ is smaller, it definitely outputs $0$ ; if it is equal, the oracle's output may be random, with the probability of outputting $1$ anywhere in $[0, 1]$ . (This is crucial for getting around diagonalization.) If $M$ isn't guaranteed to produce at least $l e n (¯ ¯ ¯ x)$ bits of output, then see below.

In the new formulation, an oracle $O$ is a pair of functions, $O = (O_{q u e r y}, O_{e v a l})$ . The first of these, $O_{q u e r y} : M \times B^{< ω} \times P \to Δ (B)$ , specifies the behavior of the oracle: $O_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q)$ is the probability that the oracle outputs $1$ when called on the query $(┌ M ┐, ¯ ¯ ¯ x, q)$ .

The second function has to do with what happens when we query the oracle about machines which aren't guaranteed to produce $l e n (¯ ¯ ¯ x)$ bits. In an intuitive sense, the idea is that we pretend that every machine always produces an infinite bitstream, even if in reality, the machine goes into an infinite loop after producing only a finite output; in this case, we choose some arbitrary probability distribution over the "output" we pretend the machine produces after this point. The function $O_{e v a l} : M \to Δ (B^{ω})$ specifies the resulting probability distribution over infinite bitstreams; that is, $O_{e v a l} (┌ M ┐)$ is the distribution that the oracle pretends the machine $M$ produces. Since we represent elements of $Δ (B^{ω})$ as functions from $B^{< ω}$ to $[0, 1]$ , this means that $O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ is what our oracle pretends is the probability that $M$ 's output starts with $¯ ¯ ¯ x$ . In particular, if $q > O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ , then invoking the oracle on $(┌ M ┐, ¯ ¯ ¯ x, q)$ will definitely output $1$ , i.e., then $O_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q) = 1$ ; and if $q < O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ , then $O_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q) = 0$ .

The intuition for $O_{e v a l} (┌ M ┐)$ is that it should only "fill in" probabilities when actually running $M$ stops generating any output; as long as $M$ keeps generating output, the distribution $O_{e v a l} (┌ M ┐)$ should reflect this. We can formalize this by the condition that for any bitstring $¯ ¯ ¯ x \in B^{< ω}$ , the probability $O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ should be greater or equal to the probability that $M$ outputs $¯ ¯ ¯ x$ given that calls to the oracle behave according to $O_{q u e r y}$ .

At the end of this post, I'll show that a pair $O = (O_{q u e r y}, O_{e v a l})$ satisfying these conditions does in fact exist.

First, though, let me give you an indication about why it seems useful to move from single-bit to multi-bit oracles. I should say, first of all, that they aren't actually more powerful than single-bit oracles---Marcello has shown that you can construct a multi-bit oracle given an oracle that works as described above, but allows only for single-bit queries---but it's not as easy as you might think; Marcello's construction uses three different tricks which you'd need to wrap your head around first. It seems easier to define the multi-bit oracle directly.

The place where multi-bit oracles are useful is when you want to define a simplicity prior, as in Solomonoff induction. In the reflective oracle setting, the natural way of doing so is to define a universal prior over bitstrings as an oracle machine that samples the Gödel number of another machine $M^{'}$ with probability proportional to $2^{- source code length}$ , and then just runs $M^{'}$ . We'd then like to use our oracle to answer queries about the resulting distribution over bitstrings. However, like in the case of Solomonoff induction, we will need to deal with the fact that not all machines $M^{'}$ produce an infinite amount of output. It would be nice if we could just check whether a given $M^{'}$ eventually loops, and don't run it if it does, but we won't be able to define an oracle that lets us do so (for machines with access to the same oracle), because of the halting problem: otherwise, we could write an $M^{'}$ that asks the oracle whether it goes into an infinite loop, and does so iff the oracle says it doesn't.

I've previously described a bag of tricks for solving this problem in the context of my original definition of reflective oracles, but with the definition given above, the problem becomes trivial: We just define our simplicity prior as a machine $M$ that samples an arbitrary $M^{'}$ and runs it, and then use our oracle on $M$ to give us a corresponding distribution over infinite bitstreams. If the $M^{'}$ we sample fails to output an infinite stream of bits, our oracle will fill in the probabilities arbitrarily, so we get a consistent distribution over infinite bitstreams. And we still have the essential property we expect of a simplicity prior: For any machine $M^{'}$ , there is a constant $C > 0$ such that $O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ is lower-bounded by $C$ times the probability that $M^{'}$ outputs $¯ ¯ ¯ x$ .

Before giving the existence proof, I'll give a more formal definition of the conditions we place on $O_{q u e r y}$ and $O_{e v a l}$ .

In order to state these conditions, it will be helpful to assume that machines $┌ M ┐ \in M$ are specified in a way which includes the current state of their working tapes: Thus, we can run a machine $M$ for a single step and describe its next state as a new machine $M^{'}$ . (In other words, elements of $M$ are really machine states.) Then, we can describe every machine by its behavior when running it for a single timestep: Either it executes a single deterministic computation, leading a new machine state $N$ , which we write as $s t e p (┌ N ┐)$ ; or it outputs a single bit $x \in B$ and transitions to state $N$ , which we write as $o u t p u t (x, ┌ N ┐)$ ; or it queries the oracle on some triple $(┌ M ┐, ¯ ¯ ¯ x, q)$ , and transitions into state $N$ or $N^{'}$ depending on the answer, which we write as $q u e r y (┌ M ┐, ¯ ¯ ¯ x, q, ┌ N ┐, ┌ N^{'} ┐)$ ; or it flips a fair coin and transitions to $N$ or $N^{'}$ depending on the result, which we write as $f l i p (┌ N ┐, ┌ N^{'} ┐)$ . In a slight abuse of notation, we'll write things like $O_{e v a l} (s t e p (┌ N ┐))$ , which is really supposed to mean " $O_{e v a l} (┌ M ┐)$ for any state $M$ in the set described by $s t e p (┌ N ┐)$ ".

An oracle is a pair $O = (O_{q u e r y}, O_{e v a l})$ , where $O_{q u e r y} : M \times B^{< ω} \times P \to Δ (B)$ and $O_{e v a l} : M \to Δ (B^{ω})$ . We now define a notion of an oracle $O^{'}$ "reflecting" another oracle $O$ ; an oracle is reflective if it reflects itself. An oracle $O^{'} = (O_{q u e r y}^{'}, O_{e v a l}^{'})$ reflects an oracle $O$ if the following conditions are satisfied:

If $q > O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ , then $O_{q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 1$ .
If $q < O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ , then $O_{q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 0$ .
$O_{e v a l}^{'} (s t e p (┌ N ┐)) = O_{e v a l} (┌ N ┐)$ .
$O_{e v a l}^{'} (o u t p u t (x, ┌ N ┐)) (x ¯ ¯ ¯ y) = O_{e v a l} (┌ N ┐) (¯ ¯ ¯ y)$ .
$O_{e v a l}^{'} (o u t p u t (x, ┌ N ┐)) ((1 - x) ¯ ¯ ¯ y) = 0$ .
$O_{e v a l}^{'} (f l i p (┌ N ┐, ┌ N^{'} ┐)) = \frac{1}{2} O_{e v a l} (┌ N ┐) + \frac{1}{2} O_{e v a l} (┌ N^{'} ┐)$ .
$O_{e v a l}^{'} (q u e r y (┌ M ┐, ¯ ¯ ¯ x, q, ┌ N ┐, ┌ N^{'} ┐))$ $=$ $(1 - O_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q)) \cdot O_{e v a l} (┌ N ┐)$ $+$ $O_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q) \cdot O_{e v a l} (┌ N^{'} ┐)$ .

The first two of these conditions just write out in symbols what I earlier said in words. The latter five conditions imply the demand that $O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ must be lower-bounded by the probability that $M$ returns $¯ ¯ ¯ x$ if oracle calls behave according to $O_{q u e r y}$ , but this isn't quite as obvious. The proof is to show by induction on $T \in N$ that for all $M$ and $¯ ¯ ¯ x$ , $O_{e v a l} (┌ M ┐, ¯ ¯ ¯ x)$ is $\geq$ the probability that $M$ outputs $¯ ¯ ¯ x$ within the next $T$ timesteps. I'll omit the details here.

As in my previous posts about versions of this system, I'll actually prove a slightly stronger result, which I expect will be helpful in showing a relation between reflective oracles and Nash equilibria. This makes use of the notion of a partial oracle, which is a pair of partial functions $π = (π_{q u e r y}, π_{e v a l})$ , $π_{q u e r y} : M \times B^{< ω} \times P ↛ Δ (B)$ and $π_{e v a l} : M \times B^{< ω} ↛ [0, 1]$ , which specify the values that $O_{q u e r y}$ and $O_{e v a l}$ should take on certain inputs. An oracle extends a partial oracle if $O_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q) = π_{q u e r y} (┌ M ┐, ¯ ¯ ¯ x, q)$ and $O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x) = π_{e v a l} (┌ M ┐, ¯ ¯ ¯ x)$ whenever the right-hand side of these equations is defined. We call a partial oracle $π$ reflective if for every $O$ extending $π$ , there is an $O^{'}$ , also extending $π$ , which reflects $O$ .

Theorem.

i. There is a reflective oracle $O$ . ii. For every reflective partial oracle $π$ , there is a reflective oracle $O$ extending $π$ .

Proof. We begin by showing that the empty partial oracle $π^{\emptyset}$ (i.e., the partial oracle satisfying $d o m (π_{q u e r y}^{\emptyset}) = \emptyset$ and $d o m (π_{e v a l}^{\emptyset}) = \emptyset$ ) is reflective, since (i) then follows from (ii). Thus, let $O$ be any oracle (since every oracle extends $π_{\emptyset}$ ), and define $O^{'}$ as follows: If $q > O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ , let $O_{q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 1$ ; else, let $O_{q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 0$ . Define $O_{e v a l}^{'}$ by the equations in the definition of " $O^{'}$ reflects $O$ ". Then, $O^{'}$ clearly both reflects $O$ and (trivially) extends $π^{\emptyset}$ . This finishes the proof that $π^{\emptyset}$ is a reflective partial oracle.

It remains to show (ii). For this, we use the infinite-dimensional generalization of Kakutani’s fixed point theorem:

Suppose that $A$ is a non-empty, convex and compact subset of a locally convex topological vector space. Suppose further that $f : A \to P o w (A)$ is a set-valued function such that $f (x)$ is non-empty, convex and compact for all $x \in A$ , and such that the graph of $f$ , ${(x, y) : x \in A, y \in f (x)},$ is a closed set. Then $f$ has a fixed point; that is, there is an $x \in A$ such that $x \in f (x)$ .

Let $A$ be the set of all oracles. We can see this as a subset of $[0, 1]^{M \times B^{< ω} \times P} \times [0, 1]^{M \times B^{< ω}}$ , which in turn is a subset of $R^{M \times B^{< ω} \times P} \times R^{M \times B^{< ω}}$ , which is a locally convex topological vector space when endowed with the product topology (since this is true of any power of $R$ ). $A$ consists of all elements $(O_{q u e r y}, O_{e v a l})$ of $[0, 1]^{M \times B^{< ω} \times P} \times [0, 1]^{M \times B^{< ω}}$ such that $g$ satisfies the additivity properties we demand of elements of $Δ (B^{< ω})$ , namely $O_{e v a l} (┌ M ┐) (ϵ) = 1$ and $O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x 0) + O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x 1) = O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ . Clearly, $A$ is nonempty, convex, and closed, and it is a subset of a product of $[0, 1]$ , which, by Tychonoff’s theorem, is compact. Moreover, both $M \times B^{< ω} \times P$ and $M \times B^{< ω}$ are countable, so it's sufficient to consider convergence of sequences in $A$ .

We choose $f (O)$ to consist of all oracles $O^{'}$ that reflect $O$ and extend $π$ . By the assumption that $π$ is reflective, $f (O)$ is non-empty for every $O \in A$ . Moreover, it is easy to see that $f (O)$ is both closed and convex; hence, if we can also show that $f$ has closed graph, then by the fixed point theorem, there is a $O \in A$ such that $O \in f (O)$ , which is exactly what we want to show.

Thus, assume that we have sequences $O_{n} = (O_{n, q u e r y}, O_{n, e v a l})$ and $O_{n}^{'} = (O_{n, q u e r y}^{'}, O_{n, e v a l}^{'})$ of oracles extending $π$ such that $O_{n}^{'} \in f (O_{n})$ for every $n \in N$ , and suppose that these sequences converge to limits $O, O^{'} \in A$ ; then we need to show that $O^{'} \in f (O)$ , i.e., that $O^{'}$ reflects $O$ . For the five equalities about $O_{e v a l}^{'}$ in the definition of " $O^{'}$ reflects $O$ ", we simply take the limit $n \to \infty$ of both sides of the equation, using the fact that convergence in the product topology is pointwise.

This leaves the two conditions about $O_{q u e r y}^{'}$ . Since the proof of the second condition is exactly symmetrical, consider without loss of generality the first of these conditions, i.e., "If $q > O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ , then $O_{q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 1$ ".

Thus, suppose that indeed $q > O_{e v a l} (┌ M ┐) (¯ ¯ ¯ x)$ . Since $O_{n} \to O$ and convergence is pointwise, there must be an $n_{0}$ such that $q > O_{n, e v a l} (¯ ¯ ¯ x)$ for all $n \geq n_{0}$ . Since $O_{n}^{'}$ reflects $O_{n}$ for all $n$ , it follows that $O_{n, q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 1$ for all $n \geq n_{0}$ , whence by pointwise convergence $O_{q u e r y}^{'} (┌ M ┐, ¯ ¯ ¯ x, q) = 1$ , concluding the proof. □

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

3

3