Dependent Type Theory and Zero-Shot Reasoning

by Evan Hubinger 1y11th Jul 20181 comment

8


I.

When I sit down to write code, I can safely throw everything I can think of at the problem and just keep trying to run the code until it works—and when it works, I can actually see it working. If my code runs, it means I actually had to understand enough of the moving parts to get it to run. This is especially true in languages like Haskell with strong type constraints preventing me from doing anything too stupid.

When I sit down to prove a theorem, I have to be much more careful. One misstep in my reasoning and my whole proof will be invalid, without me even knowing I made a mistake until I or someone else catches it later. If my proof goes through, it's possible I understood enough of the moving parts to make it work, but it's also possible I didn't.

The standard solution to the problem of not having something to test your proof against when doing math is to use a proof checker, which lets you turn an arbitrary math problem into a programming problem. Instead of having to just get the right answer without ever being able to actually check your answer, you get lots of chances to test possible solutions.

Proof checkers are great, but traditional proof checkers are also pretty limited. Concepts like equality require specialized machinery like paramodulation to properly resolve, introducing new data types requires writing out long lists of PA-like axioms and in general modern functional programming tools that make writing out math easier like pattern-matching, ADTs, monads, etc. are just missing. When I wrote my own theorem prover/proof checker a while ago I ran into exactly these sorts of problems.

Dependent type theory is the modern way to solve all of these problems. Both equality and the natural numbers are defined just by giving the constructors, no axioms needed. For-alls and implications are just functions. Proving theorems really just feels like writing Haskell code.

I'm not going to try and give a dependent type theory tutorial here, because I don't think I would be able to do better than the one I used, Theorem Proving in Lean, which I highly recommend taking a look at. That being said, I do want to give a couple examples of what I mean before I move on. Don't worry if you don't understand this part, I just want to show you some actual code in a dependent type theory.

I said equality and the natural numbers were just data types, but to show you what I mean here is the actual definition of the natural numbers in the Lean standard library

inductive nat
| zero : nat
| succ (n : nat) : nat

where : means "has type of." Those few lines are all that are required to completely define the semantics of the natural numbers—the induction rule is automatically synthesized from the above definition.

Equality is slightly more complex, but only because you need a little bit more boilerplate. Here is the actual definition of equality in the Lean standard library

inductive eq {α : Sort u} (a : α) : α → Prop
| refl : eq a

which is really just the Lean translation of the Haskell

data Equals a = Refl

where eq a represents a = a and refl stands for reflexivity, the rule that a = a always holds.

II.

Even despite all of the advances in automated proof checking with dependent type theory, there's still a big disadvantage to typing up all of your proofs in a dependent type theory like Lean: formality. When you write a proof by hand, you have the ability to elide over all sorts of details that you have to make explicit when you type them up into a proof checker. Thus, the main reason not to use a proof checker is the very reason to use one in the first place.

I think there exists a middle ground between these two extremes, however. One thing which has most impressed me working with Nate Soares is his ability to construct massive mathematical proofs on paper without a proof checker. When I asked him how he did this, he told me he used the type checker in his head. I've been trying to do this as well recently, and I want to try to share what that looks like.

Lawvere's theorem is a general result in category theory that states that if there exists a surjective function (technically but we'll elide that detail), then has the property that any function has some fixed point such that . To give you a sense of how important this theorem is, it's the general result underlying both Gödel's incompleteness theorems and Löb's theorem. I claim that Lawvere's theorem is nearly trivial to prove if you just imagine you have a type checker in your head.

Our starting environment, where the represents the thing we're trying to prove, is

from which point the only possible way to proceed is to use , which, under the interpretation where for-alls are functions, requires us to pass it some . If you think about how to construct a with the right type out of what we currently have available, there really aren't very many options. In fact, I think there are basically just the two: and , but since we know we're trying to prove something about , the definition with in it seems like the safest bet. Thus, we get

which, passing to , gives us

which we can (classically) unpack into

and if two functions are equal, then they must be equal on all inputs, and the only possible input available is , so we get

which, where , completes the proof.

III.

I was talking with Scott Garrabrant late one night recently and he gave me the following problem: how do you get a fixed number of DFA-based robots to traverse an arbitrary maze (if the robots can locally communicate with each other)? My approach to this problem was to come up with and then try to falsify various possible solutions. I started with a hypothesis, threw it against counterexamples, fixed it to resolve the counterexamples, and iterated. If I could find a hypothesis which I could prove was unfalsifiable, then I'd be done.

When Scott noticed I was using this approach, he remarked on how different it was than what he was used to when doing math. Scott's approach, instead, was to just start proving all of the things he could about the system until he managed to prove that he had a solution. Thus, while I was working backwards by coming up with possible solutions, Scott was working forwards by expanding the scope of what he knew until he found the solution.

I think that the difference between my approach and Scott's approach elucidates an important distinction regarding strategies for solving problems in general. In Alex Zhu's post on zero-shot reasoning he talked about the difference between zero-shot reasoning, where you solve the problem without ever testing your solution, and many-shot reasoning, where you solve the problem by testing your solution repeatedly. In many ways, Scott's approach to the robot problem was a zero-shot approach, whereas my approach was a many-shot approach.

One thing that's interesting in this analogy, however, is that neither Scott or I were actually using any tools to solve the problem aside from our own brains. Thus, in some sense, both of our approaches were zero-shot. And yet, my approach certainly feels far more many-shot than Scott's. I'm going to call what I did in this situation many-shot reasoning for a zero-shot problem, and I think this idea is one that applies a lot more broadly than just in that conversation. One possible way to understand the distinction between MIRI's and Paul Christiano's strategies towards AI alignment, for example, is using this distinction. In this interpretation, MIRI is solving the zero-shot problem with zero-shot reasoning, whereas Paul is solving the zero-shot problem with many-shot reasoning.

In particular, I think this idea maps well onto why I find dependent type theory so useful for solving math problems. When I write up a math problem in Lean, I'm turning what is fundamentally a zero-shot problem into something more like a many-shot problem, since it gives me a way to constantly be testing my hypotheses. But something even more interesting is happening when I'm using the type checker in my head. In that case, I'm doing something more like what I did in the robot problem, where I'm using many-shot reasoning to solve a zero-shot problem. I'm never actually using anything to test my hypotheses—instead, I'm building a model in my head of the thing which would test my hypotheses, and then testing them against that.

I think this hints at what a solution to the zero-shot reasoning problem might look like: if you can build a model of the corresponding many-shot problem which is accurate enough, then you can use many-shot reasoning on that model to solve the original zero-shot problem.

8