Update: This version of the Trolljecture fails too; see the counterexample due to Sam.

In An Informal Conjecture on Proof Length and Logical Counterfactuals, Scott discussed a "trolljecture" from a MIRI workshop, which attempted to justify (some) logical counterfactuals based on the lengths of proofs of various implications. Then Sam produced a counterexample, and Benja pointed to another counterexample.

But at the most recent MIRI workshop, I talked with Jonathan Lee and Holger Dell about a related way of evaluating logical counterfactuals, and we came away with a revived trolljecture!


Let's say we have a recursively enumerable set of axioms (in the language of PA) and a statement . Then we define the "distance" as the length of the shortest proof of using the axioms . (We set if there is no such proof.)

And now we define a valuation which we will interpret as the "probability" of the logical counterfactual that implies . (This is undefined when is undecidable given ; we could define it as a limit when searching over proofs of bounded length, in which case it would equal 0.5 in such cases. But as we shall see, this is not the case we care about.)

This valuation has the trivial properties that if and , and similarly if and . But what's more interesting are the properties that this valuation has if is inconsistent.

If there is a short proof of from , and a much longer proof of from , then . Moreover, we have an approximate coherence condition so long as either or has a proof from that is much shorter than the proof of from . (See the notes below.)

And this gets some key examples "correct" if we add the axioms that we actually care about (and which make the problem truly counterfactual). For instance, let's take Sam's counterexample to the original Trolljecture:

def U():
  if A() = 1:
    if PA is consistent:
      return 10
    else:
      return 0
  else: return 5

def A():
  if PA ⊢ A() = 1 → U() = 10:
    return 1
  else: return 2

In that case, actually returns 2 and returns 5, but PA proves the "incorrect" counterfactual that and does not prove the "correct" counterfactual that . (Read the linked post if this puzzles you.)

But what if we consider reasoning from the inconsistent premises that PA is consistent and yet ? In that case, I claim that is near 1 while is near 0.

To see this, note that there is a short proof from these axioms that (look at the definition of , apply the axiom that , apply the axiom , and conclude that ), but the shortest proof from those axioms that is longer: we would need to prove Godel's Second Incompleteness Theorem to show that , thus , and thus .

(Of course, I could be missing a much shorter proof, but I think this particular example goes through. The same reasoning also holds for the example Benja mentioned.)

Therefore, I will go out on a limb and propose the following:

Revived Trolljecture: If we consider a modal agent inside a modal universe , then for sufficiently large , the "correct" logical counterfactuals of the form "if , then " are the ones such that the valuations are high. In particular, if for sufficiently large, there is a short proof in of and no comparably short proof of , then is unique with this property, and this is the "correct" counterfactual utility for .

I'm not very confident that this is true as stated, but I have some reasonable hope that either a counterexample will be enlightening about logical counterfactuals, or that an improved analogue of the valuation will in fact be quite useful.

Notes on this valuation:

  • This is a claim about "third person" counterfactuals, not those done by an agent itself. Obviously, an agent that makes decisions based on this valuation will not be definable in modal logic, and I don't yet make any claims about what such an agent might do in the PA versions of such problems.

  • The approximate coherence is of the following form: if the shortest proof of contradiction from has length , and one of the statements , , , has a proof from of length , then I don't know a cleverer proof than going through the cases, but each case is straightforward. For instance, if , then we must have , and so it follows quickly that ; then, considering the three cases , , and neither, in each case it is clear that .

  • It does not, however, seem to be the case that whenever there is an easily provable relation between and . I am not entirely sure that I'm not missing a necessary logical relation, but it seems like we could consider two sentences and whose proofs are "independent" of the contradiction in and of each other, and then let so that is quickly provable from . Say that , , ; if these are independent in that way, we should be able to get , , , and . Then , , and will be at least 1/4 since and , so . I don't know whether this could be used to break the revived trolljecture.

  • If the shortest proof of contradiction from has length , then for any , ; the valuations are bounded away from 0 and 1. This may be a feature and not a bug: a mathematician could reason decently confidently about what math would look like in the counterfactual where Fermat's Last Theorem were false (presuming that its actual proof is quite long), but it is hard to say very much about what math would look like in the counterfactual where . However, there are some examples, like naive set theory before Russell's Paradox, where human mathematicians reasoned mostly coherently despite using axioms that (they did not notice) led quickly to contradictions; the valuation does not capture that aspect of mathematical intuition.

  • I restricted the new trolljecture to modal universes because it is well-specified how the stronger axiom schemas eventually decide the agent and the universe in the standard model of the natural numbers. If something like it holds up here, then there may be a more general version.

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 10:51 AM

Interestingly enough, the approximate coherence condition doesn't hold when there is a short proof of φ from ψ, but it does hold when there is a sufficiently short proof of ψ from φ. (Very roughly, the valuation of (φ∧¬ψ) is negligible, while the valuation of (φ∧ψ) is approximately equal to the valuation of φ.) So coherence only works one way.

On a mostly unrelated note, this sort of reasoning doesn't seem to link well with the strategy of "add important mathematical facts that you discover into your pool of starting axioms to speed up the process of deriving things." While a set of axioms, and a set of axioms plus a bunch of utility tool theorems (Godel's 2nd incompleteness, Lob's theorem, the fundamental theorem of calculus, etc..) may "span" the same space of theorems, the second one is much easier to quickly derive new interesting statements from, and seems to be how humans think about math. The inconsistency involved in getting 10 utility on the 5-and-10 problem is much easier to spot if the second incompleteness theorem is already in your pool of sentences to apply to a new problem.

As with the Russel's Paradox example, in practice, counterfactuals seem to be mind-dependent, and vary depending on which of the many different lines of reasoning a mind heads down. If you define a subjective distance relative to a particular search algorithm, this objective valuation would just use the shortest possible subjective distance to the statement and a contradiction. The valuation using the human distance function of a statement in naive set theory would be high, because we were slow to notice the contradiction. So that aspect of counterfactual reasoning seems easy to capture.

Has anyone checked what this does on ASP problems?

ASP has only been formalized in ways that don't translate to modal universes, so it's only the analogous extensions that could apply there.

In the formulation here, it's not enough that the agent one-box; it must do so provably in less than steps. I think that both the valuations and will probably be about , but I'm not sure.

Sam Eisenstat produced the following counterexample to a stronger version of the revived trolljecture (one using soundness rather than consistency in the escalated theories):

Take a statement in the language of PA such that neither nor has a short proof (less than length ) in PA+Sound(PA); can be provably true, provably false, or undecidable in that system.

Define to equal 10 if , 0 if , and 5 if .

Define to equal if there is a short proof in PA (less than length ) that , and otherwise. Clearly and .

Now I claim that in the formal system PA+Sound(PA)+, there is a very short proof of . And this is the case whether is true, false, or undecidable!

In order to prove , take the axiom . It follows from the definition of that there is a short proof in PA (less than length ) that . And it follows from the soundness of PA that in fact , and thus .

On the other hand, we cannot quickly prove , for if we could, then we could prove . Things get tricky because we're using the extra (inconsistent) axiom , but intuitively, if is some arithmetical assertion that has nothing to do with the decision problem, then there shouldn't be any short proof of in this system either, and so the shortest proof of should be at least comparable to length .

I don't know, that line of reasoning that U()=10 seems like a pretty clear consequence of PA+Sound(PA)+A()=a, and the lack of a counterfactual for "X is false" doesn't violate any of my intuitions. It's just reasoning backwards from "The agent takes action a" to the mathematical state of affairs that must have produced it (there is a short proof of X).

On second thought, the thing that broke the original trolljecture was reasoning backwards from "I take action a" to the mathematical state of affairs that produced it. Making inferences about the mathematical state of affairs in your counterfactuals using knowledge of your own decision procedure does seem to be a failure mode at first glance.

Maybe use the counterfactual of "find-and-replace all instances of X's source code in the universe program U with action a, and evaluate"? But that wouldn't work for different algorithms that depend on checking the same math facts. There needs to be some way to go from "X takes action A" to "closely related algorithm Y takes action B". But that's just inferring mathematical statements from the combination of actions and knowing X's decision rule.

I'll stick with the trolljecture as the current best candidate for "objective" counterfactuals, because reasoning backwards from actions and decision rules a short way into math facts seems needed to handle "logically related" algorithms, and this counterexample looks intuitively correct.

I can prove the property that for each hypothesis there is at most one such that has a high valuation (for sufficiently high PA+N), with the following caveat: it can sometimes take many steps to prove that in PA+N, so we'll need to include the length of that proof in our bound.

In what follows, we will take all subscripts of and to be for large.

For any , and thus

Also, . This implies , which implies

So we see that and cannot both be significantly larger than 2/3 if there is a short proof that .