Oracle machines for automated philosophy

AI ALIGNMENT FORUM
AF

Oracle machines for automated philosophy — AI Alignment Forum

This article describes an idea of Paul Christiano's. Paul described it to Benja Fallenstein and Benja described it to me. It's a way to use reflective oracle machines as an aid in answering philosophical questions.

Edit, 7am Pacific Time, 17 February 2015: Made a major correction.

Recall that a reflective oracle machine is a pair $(M, O)$ , where $M$ is a probabilistic Turing machine with an advance-only output tape and the ability to make calls to the external oracle function $O$ . $O$ is a function $M \times B^{< ω} \times Q \to Δ (B)$ satisfying certain properties. It takes as input a representation of any such Turing machine $N$ , a finite bitstring $¯ ¯ ¯ x$ , and a rational number $q$ . It outputs a $0$ or $1$ depending on whether $q$ is greater than or less than the probability that $N$ 's output begins with $¯ ¯ ¯ x$ . If the probability is exactly $q$ then $O (N, ¯ ¯ ¯ x, q)$ is random. If $N$ fails to write an infinite sequence of bits, the oracle is allowed to pretend that it does. See Benja's post for details.

#Solomonoff induction

Benja's post also mentions that we can make a Solomonoff inductor out of a reflective oracle machine: Write a program that simply iterates over all $M$ and uses $O$ to determine what the output of each $M$ is. This program will be able to update a universal prior on observations and sample predictions from the posterior. The program will be very slow; call it $C$ . We can write another program that calls $O$ on $C$ and get our answer very fast.

Suppose we had such an inductor. If we feed it enough real-world data, it will be able to make very good predictions, assuming a version of the Church-Turing thesis that admits the existence of stochastic phenomena and the existence of reflective oracle machines.

We will suppose our inductor is implemented as a software object with two callable methods, observe() and query(). observe( $x$ ) appends the bitstring $x$ plus a reserved end-of-message code to an internal list of observations. query( $x$ ) returns the message $y$ maximizing $P ((o, x, y) | (o, x))$ subject to the constraint that $y$ must be terminated by the end-of-message code, where $o$ is the internal list of observations and $(,)$ indicates concatenation. In other words, it conditions on $x$ and returns the maximum-likelihood message following $x$ . An alternative approach would be to condition on $x$ and sample $y$ from the posterior.

#Automated philosophy

Suppose we want to solve a philosophical problem, $Q$ , in a month. It would be nice if a philosopher had access to a device which, if given a question $q$ , creates an alternative universe with a copy of that philosopher initialized to some initial state, except that instead of being given the original question $Q$ to work on, the copy of the philosopher is given $q$ . After the copy works on $q$ for a month, they submit their answer, the alternate universe is terminated, and the answer is returned to the original philosopher, with no time having passed in the original universe. Thus the philosopher can freely converse with a device that is qualitatively as smart as the philosopher except it can do a month's work of thinking instantaneously. It would be even better if the copy of the philosopher in the alternate universe can submit questions of their own to their own copy of the device.

We are going to describe a way to effectively build such a device out of the inductor from the previous section. Suppose we have a question of interest $Q$ , a human, and an inductor. We choose a recursion depth $D \in N$ . The protocol is as follows:

A question $q$ and recursion depth $d$ are sampled from a certain distribution described below.
The human is presented with $q$ and thinks about it for a month. If d > 0, the human may freely submit any number of questions $q_{i}$ , $i = 1$ to $n$ . a. For each $q_{i}$ , we call inductor.query( $(q_{i}, d - 1)$ ). I.e., conditioning on observing the pair $(q_{i}, d - 1)$ , what is the inductor's prediction about the next observation? b. The answer $a_{i}$ is displayed to the human, where $(a_{i}, q^{'})$ is the value returned by inductor.query().
After a month, the human settles on a final answer $a$ to the question $q$ .
We call inductor.query( $(Q, D)$ ). The return value is a pair $(A, q^{'})$ , and $A$ provides the answer to our question $Q$ .
We randomly sample $i$ from ${1, \dots, n}$ and call inductor.observe( $(q, d, a, q_{i})$ ).

Note that even though we get our answer in step 4, we still have to reliably perform step 5, no matter what the answer is. Otherwise the inductor won't give us an accurate answer in step 4.

Note also that we could perform step 4 at the very beginning of the protocol. But we would have to be careful to prevent the human from learning the result until the end of the protocol.

The random sample in step 1 should be truly random, so that the inductor cannot predict it in advance. Sampling from a uniform distribution might have strange effects, because conditioning on $(q, d) = (Q, D)$ in step 4 effectively discards the hypothesis that $(q, d)$ is drawn from a uniform distribution. Instead we want a distribution that favors the original question $Q$ and sub-questions $q$ that naturally follow from it. So we use the following distribution instead: $(q, d) = {\begin{matrix} (Q, D) & with probability \frac{1}{2} (q^{'}, d - 1) & (q, d, a, q^{'}) sampled from inductor.query(ε) with probability \frac{1}{2} \end{matrix}$

( $ε$ is the empty string.)

The reason we use a recursion depth limit is that otherwise the result might be a fixed point of rumination chosen by the oracle inside the inductor rather than the result of a lot of thought.

Care would have to be taken to make sure that the human is motivated to work on the assigned query $q$ rather than the bigger-picture $Q$ ; they would have to be prevented from querying the inductor when $d = 0$ or making other queries not allowed by the protocol; perhaps they should be discouraged from making frivolous queries that would test their own patience; and of course, the protocol won't produce results that are qualitatively wiser than the human at its heart.

#Applications

The inductor is an impossible object. But any machine that's good at making predictions is in some sense an approximate inductor. If we had a theory of what it means for a machine to be approximately reflective, and we had a machine that was good at predicting people and which we judged to be sufficiently reflective, we might implement a version of the above protocol to augment philosophical work. Thus, while the advent of powerful predictive AI will hasten the advent of dangerous artificial agents, it might also hasten progress on philosophical problems like those in MIRI's technical research agenda.