Verification Is Not Easier Than Generation In General

I think most people's intuitions come from more everyday experiences like:

It's easier to review papers than to write them.
Fraud is often caught using a tiny fraction of the effort required to perpetrate it.
I can tell that a piece of software is useful for me more easily than I can write it.

These observations seem relevant to questions like "can we delegate work to AI" because they are ubiquitous in everyday situations where we want to delegate work.

The claim in this post seems to be: sometimes it's easier to create an object with property P than to decide whether a borderline instance satisfies property P. You chose a complicated example but you just as well have used something very mundane like "Make a pile of sand more than 6 inches tall." I can do the task by making a 12 inch pile of sand, but if someone gives me a pile of sand that is 6.0000001 inches I'm going to need very precise measurement devices and philosophical clarification about what "tall"means.

I don't think this observation undermines the claim that "it is easier to verify that someone has made a tall pile of sand than to do it yourself." If someone gives me a 6.000001 inch tall pile of sand I can say "could you make it taller?" And if I can ask for a program that halts and someone gives me a program that looks for a proof of false in PA, I can just say "try again."

I do think there are plenty of examples where verification is not easier than generation (and certainly where verification is non-trivial). It's less clear what the relevance of that is.

[-]johnswentworth3y1210

I don't think the generalization of the OP is quite "sometimes it's easier to create an object with property P than to decide whether a borderline instance satisfies property P". Rather, the halting example suggests that verification is likely to be harder than generation specifically when there is some (possibly implicit) adversary. What makes verification potentially hard is the part where we have to quantify over all possible inputs - the verifier must work for any input.

Borderline cases are an issue for that quantifier, but more generally any sort of adversarial pressure is a potential problem.

Under that perspective, the "just ask it to try again on borderline cases" strategy doesn't look so promising, because a potential adversary is potentially optimizing against me - i.e. looking for cases which will fool not only my verifier, but myself.

As for the everyday experiences you list: I agree that such experiences seem to be where peoples' intuitions on the matter often come from. Much like the case in the OP, I think people select for problems for which verification is easy - after all, those are the problems which are most legible, easiest to outsource (and therefore most likely to be economically profitable), etc. On the other hand, once we actively look for cases where adversarial pressure makes verification hard, or where there's a "for all" quantifier, it's easy to find such cases. For instance, riffing on your own examples:

It's only easier to review papers than to write them because reviewers do not actually need to catch all problems. If missing a problem in a paper review resulted in a death sentence, I expect almost nobody would consider themselves competent to review papers.
Likewise with fraud: it's only easier to catch than to perpetrate if we're not expected to catch all of it, or even most of it.
It's easier to write secure software than to verify that a piece of software is secure.

[-]paulfchristiano3y112

If including an error in a paper resulted in a death sentence, no one would be competent to write papers either.
For fraud, I agree that "tractable fraud has a meaningful probability of being caught," and not "tractable fraud has a very high probability of being caught." But "meaningful probability of being caught" is just what we need for AI delegation.
Verifying that arbitrary software is secure (even if it's actually secure) is much harder than writing secure software. But verifiable and delegatable work is still extremely useful for the process of writing secure software.

To the extent that any of these problems are hard to verify, I think it's almost entirely because of the "position of the interior" where an attacker can focus their effort on hiding an attack in a single place but a defender needs to spread their effort out over the whole attack surface.

But in that case we just apply verification vs generation again. It's extremely hard to tell if code has a security problem, but in practice it's quite easy to verify a correct claim that code has a security problem. And that's what's relevant to AI delegation, since in fact we will be using AI systems to help oversee in this way.

If you want to argue that e.g. writing secure software is fundamentally hard to verify, I think it would be much more interesting and helpful to exhibit a case of software with a vulnerability where it's really hard for someone to verify the claim that the vulnerability exists.

Rather, the halting example suggests that verification is likely to be harder than generation specifically when there is some (possibly implicit) adversary.

Rice's theorem says there are a lot of programs where you can't tell if they will halt. But if I want to write a program that will/won't halt, I'm just going to write a program for which it's obvious. And if I asked you to write a program that will/won't halt and you write the kind of program where I can't tell, I'm just going to send it back.

Now that could still be hard. You could put a subtle problem in your code that makes it so it halts eventually even though it looks like it obviously doesn't. But Rice's theorem doesn't say anything about that.

And reiterating the previous point, if someone wrote down a program that looks like it obviously doesn't halt, but secretly it does because of an adversarial trick, then I would very strongly expect that someone could point out my mistake to me and I would conclude that it is no longer obvious whether it halts. Counterexamples to this kind of optimism would be way more impactful.

[-]johnswentworth3y1-2

I think it would be much more interesting and helpful to exhibit a case of software with a vulnerability where it's really hard for someone to verify the claim that the vulnerability exists.

Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.

I get the impression that you have noticed the lack of observed counterexamples, and updated that counterexamples are rare, without noticing that you would also mostly not observe counterexamples even if they were common. (Though of course this is subject to the usual qualifiers about how it's difficult to guess other peoples' mental processes, you have better information than I about whether you indeed updated in such a way, etc.)

That said, if I were to actively look for such counterexamples in the context of software, the obfuscated C code competition would be one natural target.

We can also get indirect bits of evidence on the matter. For instance, we can look at jury trials, and notice that they are notoriously wildly unreliable in practice. That suggests that, relative to the cognition of a median-ish human, there must exist situations in which one lawyer can point out the problem in another's logic/evidence, and the the median-ish human will not be able verify it. Now, one could argue that this is merely because median-ish humans are not very bright (a claim I'd agree with), but then it's rather a large jump to claim that e.g. you or I is so smart that analogous problems are not common for us.

[-]paulfchristiano3y71

Conditional on such counterexamples existing, I would usually expect to not notice them. Even if someone displayed such a counterexample, it would presumably be quite difficult to verify that it is a counterexample. Therefore a lack of observation of such counterexamples is, at most, very weak evidence against their existence; we are forced to fall back on priors.

You can check whether there are examples where it takes an hour to notice a problem, or 10 hours, or 100 hours... You can check whether there are examples that require lots of expertise to evaluate. And so on. the question isn't whether there is some kind of magical example that is literally impossible to notice, it's whether there are cases where verification is hard relative to generation!
You can check whether you can generate examples, or whether other people believe that they can generate examples. The question is about whether a slightly superhuman AI can find examples, not whether they exist (and indeed whether they exist is more unfalsifiable, not because of the difficulty of recognizing them but because of the difficulty of finding them).
You can look for examples in domains where the ground truth is available. E.g. we can debate about the existence of bugs or vulnerabilities in software, and then ultimately settle the question by running the code and having someone demonstrate a vulnerability. If Alice claims something is a vulnerability but I can't verify her reasoning, then she can still demonstrate that it was correct by going and attacking the system.
I've looked at e.g. some results from the underhanded C competition and they are relatively easy for laypeople to recognize in a short amount of time when the attack is pointed out. I have not seen examples of attacks that are hard to recognize as plausible attacks without significant expertise or time, and I am legitimately interested in them.

I'm bowing out here, you are welcome to the last word.

[-]Vanessa Kosoy3y2022

deserves a little more credit than you give it. To interpret the claim correctly, we need to notice $P$ and $N P$ are classes of decision problems, not classes of proof systems for decision problems. You demonstrate that for a fixed proof system it is possible that generating proofs is easier than verifying proofs. However, if we fix a decision problem and allow any valid (i.e. sound and complete) proof system, then verifying cannot be harder than generating. Indeed, let $S_{1}$ be some proof system and $A$ an algorithm for generating proofs (i.e. an algorithm that finds a proof if a proof exists and outputs "nope" otherwise). Then, we can construct another proof system $S_{2}$ , in which a "proof" is just the empty string and "verifying" a proof for problem instance $x$ consists of running $A (x)$ and outputting "yes" if it found an $S_{1}$ -proof and "no" otherwise. Hence, verification in $S_{2}$ is no harder than generation in $S_{1}$ . Now, so far it's just $P \subseteq N P$ , which is trivial. The non-trivial part is: there exist problems for which verification is tractable (in some proof system) while generation is intractable (in any proof system). Arguably there are even many such problems (an informal claim).

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

31

Verification Is Not Easier Than Generation In General

31