An Untrollable Mathematician

I at first didn't understand your argument for claim (2), so I wrote an alternate proof that's a bit more obvious/careful. I now see why it works, but I'll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of $Φ$ by having it announced by nature; there may be a high probability of sentences being decided indirectly via sentences in $Φ$ .

Instead of choosing $Φ$ as you describe, pick $Φ$ so that the probability $μ (Φ)$ of sampling something in $Φ$ is greater than $1 - μ (ψ) \cdot ε / 2$ . Then, the probability of sampling something in $Φ - {ψ}$ is at least $1 - μ (ψ) \cdot (1 + ε / 2)$ . Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from $μ$ selects $ψ$ before it selects any sentence outside of $Φ$ is at least

$\begin{matrix} \infty \sum k = 0 (1 - μ (ψ) \cdot (1 + ε / 2))^{k} \cdot μ (ψ) & = \frac{μ (ψ)}{μ (ψ) \cdot (1 + ε / 2)} > 1 - ε / 2 \end{matrix}$

as desired.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering!

Oh, I didn't notice that aspect of things. That's pretty cool.

Construction

Set some prior on sampling individual sentences,

, which gives positive probability to every sentence. Now sample an infinite sequence of sentences,

ϕ_{1}, ϕ_{2}, . . .

by at each step rejection-sampling from

μ

with the requirement of propositional consistency with the sentences sampled so far. The prior probability of a sentence,

P (ψ)

, is the probability that it appears anywhere in this sequence.

This is nothing new -- it's just the old Demski prior with propositional consistency only. However, Sam's idea is to interpret the sequence of sentences as the sequence of things Nature tells us / the sequence of things which is proved. (Why didn't I think of that?) Thus, in addition to the raw probability distribution over sentences, we use the probability distribution over sequence locations. Start with

n = 1

. When a sentence

ψ

is proved (or otherwise observed), we perform a Bayesian update on

ψ = ϕ_{n}

, and increment

n

This distribution can be computed to any desired accuracy

ϵ

in finite time. The probability of a sequence prefix

ϕ_{1}, ϕ_{2}, . . . ϕ_{n}

can be computed to within

ϵ

by computing the normalization factor for

μ

at each step in the sequence to within sufficient accuracy to ensure the accuracy of the whole, by enumerating sentences and checking which are propositionally consistent with the sequence so far. The joint probability of any finite set of ordering assertions

ψ = ϕ_{n}

and unordered sentence assertions

ψ

can be computed to within

ϵ

by enumerating the sequences of sentence selections by which these can become jointly true and by which they can become false, and calculating their probabilities with increasing accuracy, until the sum of the probability of ways they can become true and ways they can become false is less than epsilon with certainty.

Untrollability

So long as the probability of a sentence

ψ

is not yet 1 or 0, it has probability at least

μ (ψ)

, since it could be sampled next. Similarly, its probability probability is at most

μ (\neg ψ)

. Hence, it is not possible to drive the probability arbitrarily up or arbitrarily down, no matter what order we prove things in.

Convergence

Note that

P

expects nature to eventually decide every proposition, in which case convergence to a single distribution is trivial; beliefs converge to 0 or 1 on every proposition. However, the posterior probabilities also converge to a single distribution even if some sentences are never updated on -- as is the case when we restrict ourselves to updating on provable sentences.

To see this, take some sentence

ϕ

and some

ϵ > 0

. We want to show that for all sufficiently large

n

, we have

| P (ψ | ϕ_{1}, ϕ_{2}, . . . ϕ_{n}) - P (ψ | ϕ_{1}, ϕ_{2}, . . . ϕ_{m}) | < ϵ

for all

m > n

. If

ψ

is eventually decided, this is trivially true. Otherwise, let

P h i

be a large finite collection of sentences which will never be decided by the environment, chosen so that the probability of sampling a sentence that will never be decided but that is outside of

Φ

before sampling

p s i

\neg ψ

(as we continue adding to the sequence by sampling) is less than

ϵ / 4

no matter what other sentences have been decided already. Then, pick

N

large enough that (1) after time

N

, the sentences announced after time

N

do not announce any new logical relations between the sentences in

Φ

; and, (2) the probability of deciding any new sentence not in

Φ

before deciding

ψ

is less than

ϵ / 2

. (1) is possible since there are only

2^{| Φ |}

joint assignments of truth values to sentences in

Φ

, so after finite time all joint assignments which will ever be ruled out have already been. (2) is possible since the probability of sampling a sentence that will never be decided but that is outside of

Φ

is already small enough, and the probability of deciding all the rest of the sentences outside of

Φ

only goes down as more sentences get decided, approaching zero, so that it is small enough for some

N

So, for

m > N

, the probability that

ψ

is decided before any sentences outside of

Φ

1 - (ϵ / 2)

, ensuring that any dependence is less than

ϵ

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering! For any ordering of assertions, we can find an

N

as specified, and the joint probability distribution on

Φ

(and

ψ

) will be the same to within epsilon.

We can also see that if the environment eventually asserts every provable sentence of

P A

, then the limit which we converge to must be a distribution on completions of

P A

: if

P A ⊢ ψ

, then there is some

n

such that

ϕ_{n} = ψ

, so the posterior probability is

1

then and beyond. Similarly, although we only require propositional coherence of

P

, the limit will be fully coherent so long as the environment eventually asserts all logical truths (whatever else it may assert).

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

16

16

Construction

Untrollability

Convergence