Allowing a formal proof system to self improve while avoiding Lobian obstacles.

[Epistemic status, can't spot a mistake, but am not confidant that there isn't one, if you find anything please say so. Posting largely because the community value of a new good idea is larger than any harm that might be caused by a flawed proof. ]

Suppose you have an automatic proof checker. Its connected to a source of statements that are sometimes correct proofs, and sometimes not. The proof checker wants to reject all false proofs, and accept as many of the true proofs as possible. It also wants to be able to update its own proof framework.

Define $S$ to be a set of statements in a particular formalism, say those that are grammatically well defined in PA. Let $B$ be any sequence from some alphabet of symbols. Let

L = {S \times B \to {⊤, ⊥}}

and $V \subset L$ be the set of programs that have the property that

\forall v \in V : \forall s \in S : (\exists b \in B : v (s, b)) ⟹ s

In other words, $V$ is the set of all programs that never prove false statements. We should never leave $V$ or need to talk about any program not in it.

For $v \in V$ write $v [s]$ to mean $\exists b \in B : v (s, b)$ . Ie $v$ proves $s$

A simple setup would consist of a starting program $p_{1} \in V$ and a rule that says,

If $p_{1} [p_{2} [s] ⟹ s]$ then you can add $p_{2}$ to your list of trusted provers. If $p_{1}$ proves the soundness of $p_{2}$ , then you can accept any statement when given a proof of it in $p_{2}$ .

The lobian obstacle is that $p_{2}$ must be strictly weaker than $p_{1}$ , in that $p_{1}$ can prove any statement that $p_{2}$ can, but $p_{1}$ can prove the soundness of $p_{2}$ and $p_{2}$ can't prove its own soundness. This means that each trusted prover has to be strictly weaker than the one that generated it. You could start with PA+3^^^3 and say that a few weakenings aren't a problem, but that isn't an elegant solution.

Note: You can't get around this problem using cycles. Suppose

a [b [s] ⟹ s] b [c [s] ⟹ s]

This would imply

a [b [c [s] ⟹ s] ⟹ (c [s] ⟹ s)] a [b [c [s] ⟹ s]] a [c [s] ⟹ s]

So any cycle could be shrunk by 1, and inductively, shrunk to a self trusting system.

I propose instead that you use the rule.

If $p_{1} [p_{2} [s] ⟹ p_{1} [s]]$ then accept any future proofs that are accepted by $p_{2}$ , and give $p_{2}$ all rights given to $p_{1}$ , including taking $p_{2} [p_{3} ⟹ p_{2}]$ to mean that $p_{3}$ is accepted, and so on recursively. If $p_{1}$ is sound, then $p_{1} [p_{2} [s] ⟹ p_{1} [s]] ⟹ (p_{2} [s] ⟹ p_{1} [s])$ so anything that is proven by $p_{2}$ can be proven by $p_{1}$ . If $p_{1}$ isn't sound, it can prove anything anyway.

Note that if $p_{2}$ is a straightforward isomorphism of $p_{1}$ then they are easily proven equivalent. However, if $p_{2}$ says "run Turing machine $T$ for $l e n g t h (b)$ steps, if it doesn't halt, check if $b$ is a valid proof of $s$ , if $T$ does halt, return $⊤$ " then it could be equivalent to $p_{1}$ from a second order logic perspective, but $p_{1}$ can't prove that $T$ never halts.

Still, this rule allows us to prove anything provable in $p_{1}$ and only things provable in $p_{1}$ , while also allowing the user to add shorthands and semantic sugar.

Note that the "proof" $b$ could just be the number of processor cycles you want to run a proof search for before giving up. In fact, this framework lets you swap and mix between hand proved results, and automatically proved results (with search time cut off) as you see fit.

This formalism allows a any system containing a proof checker to automatically upgrade itself to a version that has the same Godelian equivalence class, but is more suited to the hardware available.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

2

Allowing a formal proof system to self improve while avoiding Lobian obstacles.

2