Asymptotic Logical Uncertainty: A Modification to the Demski Prior

Scott Garrabrant

Part of the Asymptotic Logical Uncertainty series. Abram Demski proposed a computably approximable coherent probability assignment in 2012. In this post, I will present a modification developed last year by Benja Fallenstein, Abram Demski, and myself, which samples Turing machines rather than logical sentences. I will further give a concrete computable approximation of it which is a logical predictor.

Fix a UTM $U$ , which has a read only input tape with an infinite string of 0s and 1s, and a one way write only output tape which outputs a possibly infinite string of 0s and 1s and #s. We build a coherent probability distribution as follows.

Start with an empty set $T$ of logical sentences. (Or a set containing some logical axioms you wish to start with.) One at a time sample a random infinite string $w$ of 0s and 1s and use it as input for $U$ . Interpret the output $U (w)$ as a finite or infinite string of logical sentences separated by # symbols. If $T$ is consistent with all the sentences in $U (w)$ simultaneously, then modify $T$ by adding in all the sentences in $U (w)$ . Resample and repeat infinitely many times.

The probability assigned to each logical sentence is the probability that it is eventually added to $T$ in the above process. Let $P (ϕ)$ denote the probability assigned to $ϕ$ .

The step where we kept or threw out $U (w)$ based on whether or not it was consistent with $T$ required a halting oracle. However, we can approximate the above process. The following procedure $M$ takes in an time $t$ and outputs a list of sentences. It has the property the limit as $t$ goes to infinity of the probability that $M (t)$ outputs $ϕ$ converges to $P (ϕ)$ .

On input $t$ , sample $2^{t}$ binary words $w_{1}, \dots, w_{2^{t}}$ . Run $U$ for $2^{t}$ time steps on each word to get $2^{t}$ finite lists of sentences, $ℓ_{1}, \dots, ℓ_{t}$ . Let $K$ be a proof checker which takes a list of sentences and searches for contradiction in those sentences. Let $T$ start as an empty set of sentences. For $i = 1, \dots, 2^{t}$ run $K$ for $2 ↑↑ t$ time steps on $T \cup ℓ_{i}$ . If it does not find a contradiction, modify $T$ by adding in all the sentences in $ℓ_{i}$ .

We make three assumptions about $K$ . First, we assume that for any fixed list of sentences, $K$ eventually finds a contradiction if and only if a contradiction exists. Second, we assume that for sufficiently large $t$ , $K$ observes any propositional contradictions in the time we give it. By propositional contradictions, I mean that $K$ should observe a contradiction if it can be proven using only propositional calculus. Third, we assume that if $K$ observes a contradiction in $T \cup ϕ$ and $T \cup \neg ϕ$ , then $K$ should observe a contradiction in $T$ in the same amount of time. Only the first assumption is necessary for the approximation to converge to the correct thing, but the other two will be useful for uniform coherence.

This approximation procedure only gives a random process which either contains or does not contain a sentence $ϕ$ with probability converging to $P (ϕ)$ . We can turn this into a procedure which outputs probabilities. Let $N$ be a machine which on input $t$ , for each $ϕ$ which $M (t)$ outputs with probability at least $2^{- t}$ , computes the probability that $M (t)$ outputs each $ϕ$ (within an error of $2^{- t}$ ), and outputs the ordered pair containing $ϕ$ and this probability. This can be done by going through all possible execution paths for $M (t)$ until the remaining paths have probability less than $2^{- t}$ .

Finally, we can turn this into a logical predictor as defined in the previous post by running $N$ on input $t$ for all natural numbers $t$ in increasing order, and outputting a single # symbol between each pair of successive values of $t$ . In the next post, we will show that this logical predictor is uniformly coherent.

AI ALIGNMENT FORUM
AF

Asymptotic Logical Uncertainty: A Modification to the Demski Prior

3

3