Subsequence Induction (draft)

Scott Garrabrant

This post is a draft, I have not written up the relevant proof yet, and when I do, I will probably change the notation around. The basic idea is that we do resource bounded Solomonoff induction, but we allow Turing machines to only make predictions on a subset of the bits. If a Turing machine does not make a prediction for a given bit, we treat it as though it assigned whatever probability this algorithm assigned for that bit.

Here, we define another ALU algorithm proposal. This one, we can prove has very nice properties. However, it only works in a very limited domain.

Consider an environment $E$ , which on input $n$ runs in $A (n)$ time, and we want to predict it in $g (n)$ time. The following algorithm only works when $A (n) >> g (n) >> A (n - 1)$ . This is a very restrictive condition. However, we are working on some ways that might be able to use this algorithm as a subroutine, and generalize it to a larger domain. (So far,, all of our attempts to extend it have failed, but we are still working on it.)

Consider an enumeration $T_{1}, T_{2}, \dots$ of all Turing machines. For this problem, we will use finite deterministic Turing machines. Assume that the first Turing machine outputs $1 / 2$ immediately for all inputs.

For each $N$ , we define a weight function $w_{N}$ on Turing machines $T_{n}$ with $n \leq N$ as follows.

$w_{N} (T_{n}) = 2^{- n} N - 1 \prod i = n \frac{| 1 - E (i) - T_{n} (i) |}{| 1 - E (i) - M (i) |},$ where $T_{n} (i)$ is the output of $T_{n}$ on input $i$ if $T_{n}$ on input $i$ outputs a rational probability in time at most $f (i)$ . If $T_{n}$ does not output in time or does not output a probability, then $T_{n} (i)$ is defined to equal $M (i)$ , which is defined by:

$M (N)$ is the the weighted average of $T_{n} (N)$ over all $n \leq N$ such that $T_{n}$ on input $i$ outputs a rational probability in time at most $f (i)$ , where $T_{n}$ gets weight $w_{N} (T_{n})$ .

Note that $M (N)$ is recursively defined, since $w_{N} (T_{n})$ is computed using $M (i)$ for values of $i$ less than $N$ . Let $f$ be a function such that $M (N)$ can be computed in time $g (n)$ .

$M (N)$ has the following very valuable property. For any Turing machine $T$ , let $S (T)$ be the set of all natural numbers $n$ on which $T$ outputs a probability in time $f (n)$ on input $n$ . Then, there exists a nonzero constant $c$ such that for all $n$ , the product of the probabilities that $M$ assigns to the correct answer on the first $n$ bits of $S (T)$ is at least $c$ times the product of the probabilities that $T$ assigns to the correct answer on the same bits.

This condition implies that $M$ passes the generalized Benford test for rational coins, but it is actually much stronger. $M$ not only dominates any Turing machine which runs in time $f$ , but also dominates any such Turing machine on any quickly computable subsequence.

[-]Paul Christiano8y30

In online learning this is called the sleeping experts setting (we say an expert is "asleep" when it declines to make a prediction).

The algorithm in this post is the standard algorithm in the sleeping experts setting, though I don't actually know a good reference. I think it is described in the notes for this course.

Edit: I think that course actually calls the sleeping experts "specialists," if you are trying to ctrl+F for it.

[-]orthonormal9y10

This is a big deal! I like my analogy that $M$ is a prediction market where different $T_{n}$ can place bets on subsequences of $E$ (relative to the market price) and be rewarded based on their success, and the dominance property plays the role of the efficient market hypothesis...

Corrections, I think:

$T_{n}$ gets weight proportional to $w_{N} (T_{n})$

Let $f$ be a function such that $M (N)$ can be computed in time $g (N)$ (not $g (n)$ ).