Comparing Payor & Löb

I would term "hope for $x$ " rather than "reliability", because it's about willingness to enact $x$ in response to belief in $x$ , but if $x$ is no good, you shouldn't do that. Indeed, for bad $x$ , having the property of $□ x \to x$ is harmful fatalism, following along with destiny rather than choosing it. In those cases, you might want to $□ x \to \neg x$ or something, though that only prevents $x$ from being believed, that you won't need to face $□ x$ in actuality, it doesn't prevent the actual $x$ . So $□ x \to x$ reflects a value judgement about $x$ reflected in agent's policy, something downstream of endorsement of $x$ , a law of how the content of the world behaves according to an embedded agent's will.

Payor's Lemma then talks about belief in hope $□ (□ x \to x)$ , that is hope itself is exogenous and needs to be judged (endorsed or not). Which is reasonable for games, since what the coalition might hope for is not anyone's individual choice, the details of this hope couldn't have been hardcoded in any agent a priori and need to be negotiated during a decision that forms the coalition. A functional coalition should be willing to act on its own hope (which is again something we need to check for a new coalition, that might've already been the case for a singular agent), that is we need to check that $□ (□ x \to x)$ is sufficient to motivate the coalition to actually $x$ . This is again a value judgement about whether this coalition's tentative aspirations, being a vehicle for hope that $x$ , are actually endorsed by it.

Thus I'd term $□ (□ x \to x)$ "coordination" rather than "trust", the fact that this particular coalition would tentatively intend to coordinate on a hope for $x$ . Hope $□ x \to x$ is a value judgement about $x$ , and in this case it's the coalition's hope, rather any one agent's hope, and the coalition is a temporary nascent agency thing that doesn't necessarily know what it wants yet. The coalition asks: "If we find ourselves hoping for $x$ together, will we act on it?" So we start with coordination about hope, seeing if this particular hope wants to settle as the coalition's actual values, and judging if it should by enacting $x$ if at least coordination on this particular hope is reached, which should happen only if $x$ is a good thing.

(One intuition pump with some limitations outside the provability formalism is treating $□ x$ as "probably $x$ ", perhaps according to what some prediction market tells you. If "probably $x$ " is enough to prompt you to enact $x$ , that's some kind of endorsement, and it's a push towards increasing the equilibrium-on-reflection value of probability of $x$ , pushing "probably $x$ " closer to reality. But if $x$ is terrible, then enacting it in response to its high probability is following along with self-fulfilling doom, rather doing what you can to push the equilibrium away from it.)

Löb's Theorem then says that if we merely endorse a belief by enacting the believed outcome, this is sufficient for the outcome to actually happen, a priori and without that belief yet being in evidence. And Payor's Lemma says that if we merely endorse a coalition's coordinated hope by enacting the hoped-for outcome, this is sufficient for the outcome to actually happen, a priori and without the coordination around that hope yet being in evidence. The use of Löb's Theorem or Payor's Lemma is that the condition (belief in $x$ , or coordination around hope for $x$ ) should help in making the endorsement, that is it should be easier to decide to $x$ if you already believe that $x$ , or if you already believe that your coalition is hoping for $x$ . For coordination, this is important because every agent can only unilaterally enact its own part in the joint policy, so it does need some kind of premise about the coalition's nature (in this case, about the coalition's tentative hope for what it aims to achieve) in order to endorse playing its part in the coalition's joint policy. It's easier to decide to sign an assurance contract than to unconditionally donate to a project, and the role of Payor's Lemma is to say that if everyone does sign the assurance contract, then the project will in fact get funded sufficiently.

	Löb	Payor
statement of theorem	$□ (□ x \to x) \to □ x$	$□ (□ (□ x \to x) \to x) \to □ x$
condition for belief	$□ (□ x \to x)$	$□ (□ (□ x \to x) \to x)$
what we want to arrange to be (provably) true, for good $x$	$□ x \to x$	$□ (□ x \to x) \to x$
condition of above	$□ x$	$□ (□ x \to x)$
condition outcome on action	$□ (a \to x)$	$□ (a \to (□ x \to x))$
what proofs to search for (looking for best $x$ for any $a$ )	$a \to x$	$a \to (□ x \to x)$

Löb

Payor

statement of theorem

□ (□ x \to x) \to □ x

□ (□ (□ x \to x) \to x) \to □ x

condition for belief

□ (□ x \to x)

□ (□ (□ x \to x) \to x)

what we want to arrange to be (provably) true, for good

x

□ x \to x

□ (□ x \to x) \to x

condition of above

□ x

□ (□ x \to x)

condition outcome on action

□ (a \to x)

□ (a \to (□ x \to x))

what proofs to search for (looking for best

x

for any

a

)

a \to x

a \to (□ x \to x)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

26

26