x

A Löbian argument pattern for implicit reasoning in natural language: Löbian party invitations — AI Alignment Forum

Related to: Löb's Lemma: an easier approach to Löb's Theorem.

Natural language models are really taking off, and it turns out there's an analogue of Löb's Theorem that occurs entirely in natural language — no math needed. This post will walk you through the details in a simple example: a very implicit party invitation.

Important Caveat: Arguments in natural language are basically never "theorems". The main reason is that human thinking isn't perfectly rational in virtually any precisely defined sense, so sometimes the hypotheses of an argument can hold while its conclusion remains unconvincing. Thus, the Löbian argument pattern of this post does not constitute a "theorem" about real-world humans: even when the hypotheses of the argument hold, the argument will not always play out like clockwork in the minds of real people. Nonetheless, Löb's-Theorem-like arguments can play out relatively simply in the English language, and this post shows what would look like.

Motivation

(Skip this if you just want to see the argument.)

Understanding the structure here may be helpful for anticipating whether Löbian phenomena can, will, or should arise amongst language-based AI systems. For instance, Löb's Theorem has implications for the emergence of cooperation and defection in groups of formally defined agents (LaVictoire et al, 2014; Critch, Dennis, Russell, 2022). The natural language version of Löb could play a similar role amongst agents that use language, which is something I plan to explore in a future post. Aside from being fun, I'm hoping this post will make clear that the phenomenon underlying Löb's Theorem isn't just a feature of formal logic or arithmetic, but of any language that can talk about reasoning and deduction in that language, including English. And as Ben Pace points out here, invitations are often self-referential, such as when people say "You are hereby invited to the party": hereby means "by this utterance" (google search). So invitations a natural place to explore the kind of self-reference happening in Löb's Theorem.

This post isn't really intended as an "explanation" of Löb's Theorem in its classical form, which is about arithmetic. Rather, the arguments here stand entirely on their own, are written in natural language, and are about natural language phenomena. That said, this post could still function as an "explanation" of Löb's Theorem because of the tight analogy with it.

Implicitness

Okay, imagine there's a party, and maybe you're invited to it. Or maybe you're implicitly invited to it. Either way, we'll be talking a bunch about things being implicit, with phrasing like this:

"It's implicit that X",
"Implicitly X", or
"X is implicit".

These will all mean "X is implied by things that are known (to you) (via deduction or logical inference)".

Explicit knowledge is also implicit. In this technical sense of the word, "implicit" and "explicit" are not actually mutually exclusive: trivially implies $X$ , so if you explicitly observed $X$ in the world, then you also know X implicitly. If you find this bothersome or confusing, just grant me this anyway, or skip to "Why I don't treat 'implicit' and inexplicit' as synonyms here" at the end.

Abbreviations. To abbreviate things and to show there's a simple structure at play here, I'll sometimes use the box symbol " $□$ " as shorthand to say things are implicit:

" $□$ (cats love kittens)" will mean "It's implicit that cats love kittens"
" $□$ X" will mean "It's implicit that X"

A peculiar invitation

Okay! Let p be the statement "You're invited to the party". You'd love to receive such a straightforward invitation to the party, like some people did, those poo poo heads, but instead the host just sends you the following intriguing message:

If you're implicitly invited to the party, then you're invited.

Abbreviation: $□ p \to p$

Interesting! Normally, being invited to a party and being implicitly invited are not the same thing, but for you in this case, apparently they are. Seeing this, you might feel like the host is hinting around at implicitly inviting you, and maybe you'll start to wonder if you're implicitly invited by virtue of the kind of hinting around that the host is doing with this very message. Well then, you'd be right! Here's how.

For the moment, forget about the host's message, and consider the following sentence, without assuming its truth (or implicitness):

$Ψ$ :

If this sentence is implicit, then you’re invited to the party.

The sentence $Ψ$ has some cool properties that are going to let us write out a clever argument.

(Optional digression: Perhaps the apparent circularity of $□$ reminds you of an instance of Curry's paradox, like "If this sentence is true, then Santa Clause exists". The Santa Clause sentence is quite problematic, because if you try to assign it a truth value, you find that it must be true and that Santa Clause exists. If that's not obvious, or Curry's paradox doesn't bother you, just skip this paragraph and don't worry about it. If you are bothered by it, note that Curry's paradox is basically the content of Tarski's theorem on the undefinability of truth: if you try to let sentences talk about their own truth, you get paradoxes. But truth is not the same same as implicitness. It turns out sentences can talk about their own implicitness without leading to logical paradoxes, if "implicit" means what it means in this post: following by logical implication from things you know. )

The main observation we need to move forward is just this:

Line 0:

"If this sentence is implicit, then you’re invited." 
means the same thing as 
"If it's implicit that
	"If this sentence is implicit, then you’re invited."
then you’re invited.".

Abbreviation: $Ψ \leftrightarrow □ Ψ \to p$

Hopefully that's intuitively clear once you stare at it for 60 seconds. Just take the first sentence and unpack the "this", and you'll see it means the same thing as the second sentence. Please do give yourself a whole 60 seconds for this!

Another intuitive property we'll need is the following, which doesn't depend at all on the message from the host:

Line 1:

If it's implicit that 
	"If this sentence is implicit, then you’re invited."
then you're implicitly invited.

Abbreviation: $□ Ψ \to □ p$

Please give yourself at least another 60 seconds to stare at that one. In short, if the indented thing were implicit, then you'd know it was implicit so it'd be implicitly implicit, which would allow you to apply it in an implicit argument to conclude you're implicitly invited. In case that doesn't become intuitively clear, please see "Proving Line 1" at the end see see these steps explicitly :)

Anyway, how does this stuff get you invited to the party? Well, the host did say to you specifically:

Line 2 (host's message):

If you're implicitly invited to the party, then you're invited.

Abbreviation: $□ p \to p$

... which combines with Line 1 to give:

Line 3:

If it's implicit that 
	"If this sentence is implicit, then you’re invited"
then you're invited.

Abbreviation: $□ Ψ \to p$

By Line 0, this is the same as:

Line 4:

If this sentence is implicit, then you’re invited.

Abbreviation: $Ψ$

Now, being the clever person that you are, you realize that since all of the above were implied by things you knew, it's all implicit:

Line 5:

It's implicit that "If this sentence is implicit, then you’re invited.".

Abbreviation: $□ Ψ$

Finally, from Line 3 and Line 5, it follows that:

Line 6:

You're invited.

Abbreviation: $p$

Whoah, you're invited! Just because you thought about the weirdly indirect invitation long enough to infer the implication that you were invited.

Now that's my kind of party :)

Proving Line 1

Line 1 doesn't actually require the host's message at all. It follows directly from Line 0, like this:

Suppose:

It's implicit that "If this sentence is implicit, then you’re invited.".

Abbreviation: Suppose $□ Ψ$

When something's implicit, you can tell that it's implicit by working through the steps of the implication. From this supposition, we're going to conclude $□ p$ , and then drop the supposition to conclude $□ Ψ \to □ p$ .

From the supposition, we get:

Line 0.1:

It's implicit that 
	"It's implicit that 
		"If this sentence is implicit, then you’re invited."
	."

Abbreviation: $□ □ Ψ$

Also, by Line 0 and the supposition, we get

Line 0.2:

It's implicit that
	"If it's implicit that 
		"If this sentence is implicit, then you’re invited."
	then you're invited."

Abbreviation: $□ (□ Ψ \to p)$

Now lines 0.1 and 0.2 combine, still under the supposition, to give:

Line 0.3:

It's implicit that you're invited.

Abbreviation: $□ p$

Now, since we supposed $□ Ψ$ and deduced $□ p$ , we can drop the supposition and conclude an implication:

Line 1:

If it's implicit that 
	"If this sentence is implicit, then you’re invited."
then you're implicitly invited.

Abbreviation: $□ Ψ \to □ p$

Why I don't treat 'implicit' and 'inexplicit' as synonyms here

In this post, I counted explicit knowledge as a special case of implicit knowledge. Here are five reasons for this:

to avoid saying "implicit or explicit" everywhere, which is cumbersome and annoying to read;
because one of the definitions "implicit" is "implied (from context)", and explicitly known context will always trivially imply itself, making it technically also "implied from context";
for consistency with the concept of "implicit trust". Sometimes people say "I trust {something} implicitly", and stating that trust explicitly need not invalidate the fact that it was also strongly implied by context. The trust may be simultaneously implicitly known and explicitly stated.
because "inexplicit" is already a fine and unambiguous antonym for "explicit", so we needn't waste "implicit" to mean exactly the same thing;
to avoid introducing a more jargon-y term like "deducible" or "provable". Such jargon wouldn't naturally combine as well with concepts like "trust", "invitation", and "agreement", which I want to write about in future posts. Phrases like "I trust you provably" and "you're deducibly invited to the party" are much less natural combinations of normal words.

One draft reader complained that most usages of the word "implicit" are for referring to things that are inexplicit, suggesting (implicitly!) that the two words should be treated as synonyms. However, consider that most usages of the word "mammal" are used to refer to non-human animals, yet "mammal" is not a perfect antonym of "human", and in fact includes humans as a specific case. Yes, I will grant that calling an explicit thing "implicit" is usually misleadingly vague, the way it's misleadingly (and perhaps humorously) vague to say "I live with another mammal" when you actually live with another human. But that needn't mean that "mammal" should be an antonym of "human". Similarly, "implicit" needn't be treated as an "antonym" of explicit, especially when we have a perfectly good English word "inexplicit" that already unambiguously means "not explicit".

Finally, to expound more on point 5, "trusting someone implicitly" means that stating the trust explicitly is unnecessary because the trust is implied without doubt (example google results). That usually doesn't rule out the possibility of also stating the trust explicitly. For the sake of making crisp arguments about implicit trust and implicit agreements — which I'll want to do in some subsequent posts — treating implicit and explicit as opposites will be a recipe for very cumbersome language. So for simplicity, it will be better to use "implicit" for "things that follow via implication", and use "inexplicit" if you really want a single-world adjective to unambiguously call something "not explicit".

Recap and Takeaways

We let some sentences talk about their own implicitness. This didn't lead to a paradox, but did yield a nice conclusion about being invited to a party. The argument follows the same structure as a proof of Löb's Theorem, which can be found here, but did not "depend on" Löb's Theorem by invoking any math or theorems in formal logic.
Implicit here didn't mean the same thing as inexplicit. In particular, explicit observations constitute implicit knowledge, because it's implied by itself.
Plausibly, AI reasoning entirely in natural language will soon be capable of completing arguments like the above. Understanding the implications for such arguments may be important for understanding how language-based AI can, will, or should behave.