This post is part of the Asymptotic Logical Uncertainty series. In this post, I will talk more about the exact assumptions we need to make about the sequence of sentences in the Benford Test.
I will start by making a few changes to the previous notation. First, we no longer need the output of to be computable in bounded time. From now on, we can just say that is a Turing machine that on input eventually accepts if is provable, eventually rejects if is disprovable, and runs forever otherwise.
We will also change the sequence of sentences in the Benford test. Instead of "The first digit of in base 10 is 1," we will use "The first digit of in base 10 is 1." This way, the reasonable answer to every question is as opposed to before the powers of 10 were an exception. This change does not make the Benford test weaker, and the program we present will also pass the original Benford test, but it will make many definitions have fewer messy cases.
Finally, we will switch to deterministic machines. Instead of making a machine which outputs 1 with the correct probability, we will have a machine that outputs a probability. This clearly makes the problem no easier.
Here is the new version of the Benford test:
Let be a Turing machine which on input runs quickly and outputs a probability , which represents the probability assigned to . We say that satisfies the Benford test if where ``The first digit of is a 1.''
In this post, we will not present a solution to the Benford test, but will be very explicit about the assumptions we need to make.
The fact that the best probability to assign to is is dependent not only on the fact that frequency with which is true tends to , but also on the fact that the sequence of truth values of contains no patterns that can be used to quickly compute a better probability on some subsequence. We therefore assume that this sequence of truth values is indistinguishable from a a sequence produced by a coin that outputs "true" with probability . Formally, we are assuming that is an irreducible pattern with probability as defined below.
Fix a universal Turing machine and an encoding scheme for machines, and let denote running the machine to simulate with input .
Let be an infinite subset of natural numbers such that is provable or disprovable for all , and there exists a Turing machine which on input runs in time and accepts if and only if . We say that is an irreducible pattern with probability if there exists a constant such that for every positive integer and every quickly computable subset with at least elements, we have where is the number of bits necessary to encode a Turing machine such that, for , accepts in time if and only if , and is the probability that~ is provable when is chosen uniformly at random from the first elements of .
This may seem like an unmotivated definition, but there is a good reason for it. It comes from the Law of the Iterated Logarithm. A coin that outputs true with probability will pass this test with probability 1. The definition is tight in the sense that we cannot replace the right hand side with something that diminishes more quickly as increases. It is also important to note that while we think that this is a good definition of the subsequence being quickly indistinguishable from a coin, we really only need it as a necessary condition, so that the sequence of Benford sentences is an irreducible pattern.
Theorem: If we replace provability in the definiton of irreducible pattern with random process, such that for each the sentence is independently called "provable" with probability , then would almost surely be an irreducible pattern with probability .
Proof: Fix a Turing machine . By the law of the iterated logarithm, there exists a constant such that almost surely. Therefore almost surely. We will use as a short hand for this supremum. For any , there therefore exists a such that with probability at most .
We now show that the probability that is at most . It suffices to show that the probability of given is at most . Let be the first such that It suffices to show that the probability that there exists an with is at most .
Observe that and that the probability there exists an with is the same as the probability that , which is at most .
We have thus shown that for every there exists a constant such that the probability that is at most .
Partition the set of all Turing machines into sets such that contains all Turing machines expressed in at least , but fewer than bits. The probability that a Turing machine in violates (call this equation ) for any is at most . The number of Turing machines in is at most , so the probability that there is any and which violate () is at most .
Therefore, the probability that there is any Turing machine and which violate () is at most For small enough this goes to 0, so for large enough , the probability that () holds for all and goes to 1. Therefore, with probability 1, there exists a such that for all and .