Agency and the unreliable autonomous car

[-]Ben Pace4y20

What is this, "A Series of Unfortunate Logical Events"? I laughed quite a bit, and enjoyed walking through the issues in self-knowledge that the löbstacle poses.

[-]JBlack4y*10

This looks full of straightforward errors.

Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...

This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.

[-]Alex Flint4y00

Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...

Are you referring to step 5a/5b of the algorithm, that says search for a proof and then act based on what is found? I wouldn't exactly characterize this as adding a further deduction rule. But perhaps you are referring to something?

This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.

It's true this gets a bit more finicky in the resource-bounded case. Critch's paper on resource-bounded Lob goes into this. I haven't looked into this very much.

[-]JBlack4y10

Yes, I'm referring to 5a/5b in conjunction with self-reflection as a deduction rule, since the agent is described as being able to use these to derive new propositions.

There is also a serious problem with 5a itself: the agent needs to try to prove that some new proposition P is "consistent with its model of the world". That is, for its existing axioms and derivation rules T, prove that T+P does not derive a contradiction.

If T is consistent, then this is impossible by Gödel's second incompleteness theorem. Hence for any P, Step 5a will always exhaust all possible proofs of up to whatever bound and return without finding such a proof. Otherwise T is inconsistent and it may be able to find such a proof, but it is also obvious that its proofs can't be trusted and not at all surprising that it will take the wrong route.

[-]Alex Flint4y20

I think that I should modify 5a from "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters" to "Search to a proof of this sentence, using your model of the world as a set of starting assumptions". This is indeed a significant change to the algorithm and I thank you for pointing it out. I think that would resolve your second concern, about the problem with 5a itself, yes?

I think it might also resolve your first concern, about the unsoundness of the logical system, because the agent does not have a deduction rule that says that if P can be proved then P is true, but rather reasons from the fact that P can be proved, and the fact that this particular agent chooses its actions based on proof search and will behave in such-and-such a way if it finds a proof of this particular P, to conclude P. This only work for certain particular sentences P, such as the one that is constructed in this story, since this particular sentence P is one that, if the agent finds a proof of it, will cause it to take actions that lead to P itself being true.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

13

Agency and the unreliable autonomous car

13

Outline

Parable of the unreliable autonomous car

Discussion