AI ALIGNMENT FORUM
AF

Probabilistic Payor Lemma? — AI Alignment Forum

Epistemic status: too good to be true? Please check my math.

We've known for a while that Löb's theorem fails when proof is relaxed to probabilistic belief. This has pros and cons. On the pro side, it means there's no Löbian Obstacle to probabilistic self-trust. On the con side, it means that some Löb-derived insights for proof-based decision theory don't translate to probabilistic decision theory, at least not as directly as one might hope. In particular, it appeared to dash hopes for probabilistic generalizations of the "Löbian handshake" for cooperation.

Recently, Andrew Critch wrote about the Payor Lemma, which allows for a very similar "modal handshake" without Löb's Theorem. The lemma was proved using the same modal assumptions as Löb's, so on the surface it may appear to be just a different method to achieve similar results, whose main advantage is that it is much easier to prove (and therefore explain and understand) than Löb's Theorem.

But, a natural question arises: does Payor's Lemma have a suitable probabilistic version?

I'll give an affirmative proof; but I haven't confirmed that the assumptions are reasonable to my satisfaction.

Setup

Let be a language in first-order logic, expressive enough to represent its sentences $s \in L$ as quoted terms $┌ s ┐$ , eg, through Gödel numbering; and with a probability function symbol on these terms, $p (┌ s ┐)$ , which can be equated with (some representation of) rational numbers, e.g. $p (┌ ⊤ ┐) = 1$ , $p (┌ s ┐) = \frac{1}{2}$ , etc. I also assume the system can reason about these rational numbers in the basic ways you'd expect.

For all $a, b \in L$ and all $r \in Q$ , we have:

If $⊢ a$ , then $⊢ p (┌ a ┐) = 1$ .
If $⊢ a \to b$ , then $⊢ p (┌ a ┐) \leq p (┌ b ┐)$ .

(These assumptions might look pretty minimal, but they aren't going to be true for every theory of self-referential probability; more on this later.)

Let $B (s)$ abbreviate the sentence $p (┌ s ┐) > c$ for any $s$ and some globally fixed constant $c$ strictly between 0 and 1. This is our modal operator. (This modal operator is often called "p-belief" with the constant $c$ named $p$ instead, but I wanted to reserve $p$ for the probability operator.)

Some important properties of $B$ :

Necessitation. If $⊢ s$ , then $⊢ B (s)$ , for any $s$ .

Proof: Since $⊢ s$ implies $⊢ p (s) = 1$ , and $c \in (0, 1)$ , we have $⊢ p (┌ s ┐) > c$ , which is to say, $⊢ B (s)$ . [End proof.]

Weak distrubitivity. If $⊢ x \to y$ , then $⊢ B (x) \to B (y)$ .

Proof: When $⊢ x \to y$ , we have $⊢ p (y) \geq p (x)$ , so $⊢ p (x) > c \to p (y) > c$ . [End proof.]

(Regular distributivity would say $B (x \to y)$ implies $B (x) \to B (y)$ . The assumption $⊢ x \to y$ is stronger than $B (x \to y)$ , so the above is a weaker form of distributivity.)

Theorem Statement

If $⊢ B (B (x) \to x) \to x$ , then $⊢ x$ .

Proof

$⊢ x \to (B (x) \to x)$ , by tautology $(a \to (b \to a))$ .
So $⊢ B (x) \to B (B (x) \to x)$ , from 1 by weak distributivity.
Suppose $⊢ B (B (x) \to x) \to x$ .
$⊢ B (x) \to x$ from 2 and 3, by propositional logic.
$⊢ B (B (x) \to x)$ from 4 by necessitation.
$⊢ x$ from 5 and 3.
[End proof.]

Discussion

Comparison to Original Proof

The proof steps mirror Critch's treatment very closely. The key difference is step 2, IE, how I obtain a statement like $⊢ □ x \to □ (□ x \to x)$ . Critch uses distributivity, which is not available to me. Here's what I could get instead:^[1]

$B (a \to b) \to (B (a) \to B (b))$ ?
- Suppose $B (a \to b)$ , ie, $p (┌ a \to b ┐) > c$ .
- Rewrite $p (┌ b \lor \neg a ┐) > c$ .
- Now suppose $B (a)$ , that is, $p (┌ a ┐) > c$ .
- $p (┌ \neg a ┐) < 1 - c$ .
- $p (┌ b \lor \neg a ┐) \leq p (┌ b ┐) + p (┌ \neg a ┐) < p (┌ b ┐) + 1 - c$ .
- $p (┌ b \lor \neg a ┐) - 1 + c < p (┌ b ┐)$ .
- $p (┌ b ┐) > p (┌ b \lor \neg a ┐) - 1 + c > c - 1 + c$ .
- $p (┌ b ┐) > 2 c - 1$ .
- So we only get:
  - $B_{c} (a \to b) \to (B_{c} (a) \to B_{d} (b))$ ,
  - where $B_{r} (s)$ abbreviates $p (┌ s ┐) > r$ and we have $d = 2 c - 1$ .

So, it seems attempted applications of distributivity will typically create weakened belief operators, which would get in the way of the proof (very similar to how probabilistic Löb fails).

However, the specific application we want happens to go through, due to a logical relationship between $a$ and $b$ ; namely, that $b$ is a weaker statement than $a$ .

This reveals a way in which the assumptions for Payor's Lemma are importantly weaker than those required for Löb to go through.

So, the key observation I'm making is that weak distributivity is all that's needed for Payor, and it seems much more plausible for self-referential probabilistic reasoning than regular distributivity.

Reasonable Axioms?

This proof shouldn't be regarded as a final result, since I'm not sure the probability axioms I gave are good. They're consistent: we can consistently assign probability 1 to everything, at least. But obviously that's not an intended model. We can fix this by adding the axiom $⊢ \neg a ⟹ ⊢ p (a) = 0$ . I excluded this simply because I don't need it in the proof. If we want to do full probability, we will also want to add something like the inclusion-exclusion principle. So our four axioms could be:

0. $⊢ \neg a ⟹ ⊢ p (a) = 0$

1. $⊢ a ⟹ ⊢ p (a) = 1$

2. $⊢ a \to b ⟹ ⊢ p (a) \leq p (b)$

3. $⊢ p (a) + p (b) = p (a \land b) + p (a \lor b)$

These are similar to the axioms for logical-probability in From Classical to Intuitionistic Probability.

But is the resulting system consistent? While the axioms may look extremely innocuous, the fact that they embed information about the probability distribution within the same logic which the probability distribution has beliefs over creates a possibly dangerous amount of self-reference. We need to worry about self-referential sentences like the "probabilistic Liar sentence" $λ$ :

λ := p (λ) < .5

We can derive something like the reflection schema in Definability of Truth in Probabilistic Logic, which is reassuring since stuff like $λ$ is consistent in that system:

If $⊢ a < p (┌ x ┐) < b$ , then $⊢ p (┌ a < p (┌ x ┐) < b ┐) = 1$ .

This is true since $⊢ s$ implies $⊢ p (┌ s ┐) = 1$ for any $s$ . But we can similarly derive a principle analogous to one rejected in that paper:

If $⊢ a \leq p (┌ x ┐) \leq b$ , then $⊢ p (┌ a \leq p (┌ x ┐) \leq b ┐) = 1$ .

The presence of the $⊢$ changes things enough that the relationship to the original paper isn't obvious.

The key difference between the kind of self-reference I need and the kind explored in Definability of Truth in Probabilistic Logic is that that paper focused on self-knowledge about the probability values themselves. What I need is self-knowledge about the rules of probability followed. Unfortunately, the paper doesn't address this, so more work is required to check whether my two axioms could be added consistently.

My hope is that the proof of Payor's Lemma I suggest shouldn't depend on too much esoteric stuff about how exactly we define self-referential probability and make it consistent. So, my conjecture would be that for many approaches to self-referential probability, some translation of Payor's Lemma works.

But my theorem above is only very limited evidence of this.

^{^}
I'm not being super careful about what axioms I need, below; I'm assuming something like "ordinary probabilistic reasoning". I'm not proving that I can't have full distributivity (which would require a different approach); I'm just illustrating why it doesn't seem to make sense for probabilistic beliefs.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

36

Probabilistic Payor Lemma?

36

Setup

Theorem Statement

Proof

Discussion

Comparison to Original Proof

Reasonable Axioms?