Asymptotic Logical Uncertainty: Irreducible Patterns

Scott Garrabrant

This post is part of the Asymptotic Logical Uncertainty series. In this post, I will talk more about the exact assumptions we need to make about the sequence of sentences in the Benford Test.

I will start by making a few changes to the previous notation. First, we no longer need the output of $L$ to be computable in bounded time. From now on, we can just say that $L$ is a Turing machine that on input $N$ eventually accepts if $ϕ_{N}$ is provable, eventually rejects if $ϕ_{N}$ is disprovable, and runs forever otherwise.

We will also change the sequence of sentences in the Benford test. Instead of "The first digit of $A (n)$ in base 10 is 1," we will use "The first digit of $3 ↑^{n} 3$ in base 10 is 1." This way, the reasonable answer to every question is $\frac{1}{log 10}$ as opposed to before the powers of 10 were an exception. This change does not make the Benford test weaker, and the program we present will also pass the original Benford test, but it will make many definitions have fewer messy cases.

Finally, we will switch to deterministic machines. Instead of making a machine which outputs 1 with the correct probability, we will have a machine that outputs a probability. This clearly makes the problem no easier.

Here is the new version of the Benford test:

Let $M$ be a Turing machine which on input $N$ runs quickly and outputs a probability $M (N)$ , which represents the probability assigned to $ϕ_{N}$ . We say that $M$ satisfies the Benford test if ${lim}_{n \to \infty} M (s_{n}) = \frac{1}{log (10)},$ where $ϕ_{s_{n}} =$ ``The first digit of $3 ↑^{n} 3$ is a 1.''

In this post, we will not present a solution to the Benford test, but will be very explicit about the assumptions we need to make.

The fact that the best probability to assign to $ϕ_{s_{n}}$ is $\frac{1}{log (10)}$ is dependent not only on the fact that frequency with which $ϕ_{s_{n}}$ is true tends to $\frac{1}{log (10)}$ , but also on the fact that the sequence of truth values of $ϕ_{s_{n}}$ contains no patterns that can be used to quickly compute a better probability on some subsequence. We therefore assume that this sequence of truth values is indistinguishable from a a sequence produced by a coin that outputs "true" with probability $\frac{1}{log (10)}$ . Formally, we are assuming that $S = {s_{n} | n \in N}$ is an irreducible pattern with probability $\frac{1}{log (10)}$ as defined below.

Fix a universal Turing machine $U T M$ and an encoding scheme for machines, and let $U T M (M, x)$ denote running the machine $U T M$ to simulate $M$ with input $x$ .

Let $S \subseteq N$ be an infinite subset of natural numbers such that $ϕ_{N}$ is provable or disprovable for all $N \in S$ , and there exists a Turing machine $Z$ which on input $N$ runs in time $T (N)$ and accepts $N$ if and only if $N \in S$ . We say that $S$ is an irreducible pattern with probability $p$ if there exists a constant $c$ such that for every positive integer $m \geq 3$ and every quickly computable subset $S^{'} \subseteq S$ with at least $m$ elements, we have $| r (m, W) - p | < \frac{c k (W) \sqrt{log log m}}{\sqrt{m}},$ where $k (W)$ is the number of bits necessary to encode a Turing machine $W$ such that, for $N \in S$ , $U T M (W, N)$ accepts in time $T (N)$ if and only if $N \in S^{'}$ , and $r (m, W)$ is the probability that~ $ϕ_{N}$ is provable when $N$ is chosen uniformly at random from the first $m$ elements of $S^{'}$ .

This may seem like an unmotivated definition, but there is a good reason for it. It comes from the Law of the Iterated Logarithm. A coin that outputs true with probability $p$ will pass this test with probability 1. The definition is tight in the sense that we cannot replace the right hand side with something that diminishes more quickly as $m$ increases. It is also important to note that while we think that this is a good definition of the subsequence being quickly indistinguishable from a coin, we really only need it as a necessary condition, so that the sequence of Benford sentences is an irreducible pattern.

Theorem: If we replace provability in the definiton of irreducible pattern with random process, such that for each $N \in S$ the sentence $ϕ_{N}$ is independently called "provable" with probability $p$ , then $S$ would almost surely be an irreducible pattern with probability $p$ .

Proof: Fix a Turing machine $W$ . By the law of the iterated logarithm, there exists a constant $c_{1}$ such that $limsup m \to \infty \frac{| m r (m, W) - m p |}{\sqrt{m log log m}} = c_{1}$ almost surely. Therefore $sup m \frac{| m r (m, W) - m p |}{\sqrt{m log log m}} < \infty$ almost surely. We will use $Φ (W)$ as a short hand for this supremum. For any $ε > 0$ , there therefore exists a $c_{2}$ such that $Φ (W) > c_{2}$ with probability at most $ε$ .

We now show that the probability that $Φ (W) > 2 c_{2} + 1$ is at most $ε^{2}$ . It suffices to show that the probability of $Φ (W) > 2 c_{2} + 1$ given $Φ (W) > c_{2}$ is at most $ε$ . Let $m_{1}$ be the first $m$ such that $\frac{| m r (m, W) - m p |}{\sqrt{m log log m}} > c_{2} .$ It suffices to show that the probability that there exists an $m_{2}$ with $\frac{| m_{2} r (m_{2}, W) - m_{2} p |}{\sqrt{m_{2} log log m_{2}}} - \frac{| m_{1} r (m_{1}, W) - m_{1} p |}{\sqrt{m_{1} log log m_{1}}} > c_{2}$ is at most $ε$ .

Observe that $\frac{| m_{2} r (m_{2}, W) - m_{2} p |}{\sqrt{m_{2} log log m_{2}}} - \frac{| m_{1} r (m_{1}, W) - m_{1} p |}{\sqrt{m_{1} log log m_{1}}} \leq \frac{| m_{2} r (m_{2}, W) - m_{1} r (m_{1}, W) - (m_{2} - m_{1}) p |}{\sqrt{(m_{2} - m_{1}) log log (m_{2} - m_{1})}},$ and that the probability there exists an $m_{2}$ with $\frac{| m_{2} r (m_{2}, W) - m_{1} r (m_{1}, W) - (m_{2} - m_{1}) p |}{\sqrt{(m_{2} - m_{1}) log log (m_{2} - m_{1})}} > c_{2}$ is the same as the probability that $Φ (W) > c_{2}$ , which is at most $ε$ .

We have thus shown that for every $ε,$ there exists a constant $c_{3} = c_{2} + 1$ such that the probability that $Φ (W) \geq 2^{ℓ} c_{3}$ is at most $ε^{2^{ℓ}}$ .

Partition the set of all Turing machines into sets $W_{1}, W_{2}, \dots,$ such that $W_{ℓ}$ contains all Turing machines expressed in at least $2^{ℓ}$ , but fewer than $2^{ℓ + 1}$ bits. The probability that a Turing $W$ machine in $W_{ℓ}$ violates $| r (m, W) - p | < \frac{c_{3} k (W) \sqrt{log log m}}{\sqrt{m}},$ (call this equation $(⋆)$ ) for any $m \geq 3$ is at most $ε^{2^{ℓ}}$ . The number of Turing machines in $W_{ℓ}$ is at most $2^{2^{ℓ + 1}}$ , so the probability that there is any $W \in W_{ℓ}$ and $m \geq 3$ which violate ( $⋆$ ) is at most $ε^{2^{ℓ}} 2^{2^{ℓ + 1}}$ .

Therefore, the probability that there is any Turing machine $W$ and $m \geq 3$ which violate ( $⋆$ ) is at most $\sum ℓ \in N ε^{2^{ℓ}} 2^{2^{ℓ + 1}} = \sum ℓ \in N (4 ε)^{2^{ℓ}} .$ For small enough $ε$ this goes to 0, so for large enough $c_{3}$ , the probability that ( $⋆$ ) holds for all $W$ and $m$ goes to 1. Therefore, with probability 1, there exists a $c$ such that $| r (m, W) - p | < \frac{c k (W) \sqrt{log log m}}{\sqrt{m}},$ for all $m$ and $W$ .

[-]Vanessa Kosoy10y20

It appears to me that there is a natural analogue of the concept of irreducible pattern in the language of average case complexity. Moreover, the calibration theorem for optimal predictor schemes implies they pass the Benford test in the associated sense. I'll write it down carefully and post...

[-]Vanessa Kosoy10y10

There is a typo in the text: "We say that S is an ??? with probability p." I guess this is supposed to be "irreducible pattern"?

Btw, it seems that the definition makes sense for arbitrary promise problems, you don't have to consider provability in particular.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

3

Asymptotic Logical Uncertainty: Irreducible Patterns

3