Why do you believe that everything "effectively computable" can be computed by a Turing machine? In the same vein: How would you recognize this to be false?

I don't particularly doubt that this is true. But I want to get a better gears level understanding for why this might be true. When this came up during my "fundamentals of computer science" class, my professor was communicating high confidence that this was essentially correct (joking he could speed up the graduation of the student who would show this is false), but gave very little info on why he believed this in the first place.

I got a bit more out of the respective Wikipedia article. I get that people have come up with different formalisms for "effectively computable" and they all seem to be weaker or equivalent to Turing machines, and that does seem like compelling evidence. But, I want to understand whether there is more to it. There seem to be points in favor, like this quote from Gödel that I pulled from the Wikipedia article:

It may also be shown that a function which is computable ['reckonable'] in one of the systems Si, or even in a system of transfinite type, is already computable [reckonable] in S1. Thus the concept 'computable' ['reckonable'] is in a certain definite sense 'absolute', while practically all other familiar metamathematical concepts (e.g. provable, definable, etc.) depend quite essentially on the system to which they are defined ...

Does this mean: The act of showing that something would be computable in a System would effectively show how to do it with a System (e. g. a Turing machine?). To me this seems kind of circular and to rely on the fact that there is no "stronger" formal system that would itself be stronger as a Turing machine that we would use to show the existence of such a system (this is probably a confused thought, so help in understanding this would be welcome)?

Another thought: What if our brains just happen to be Turing machines (but other things in the world are "stronger") and there is just no way to formally describe these higher level Algorithms with our limited brains (analogy: Could our brains come up with Turing machines if they were finite automatons?). Could there be ways for us to notice if this was the case?

So to sum up, I still find this quite confusing and thought this might be the right place to get good answers or to clarify these points.

New to LessWrong?

New Answer
New Comment

9 Answers sorted by

Sometimes in mathematics, you can right 20 slightly different definitions and find you have defined 20 slightly different things. Other times you can write many different formalizations and find they are all equivalent. Turing completeness is the latter. It turns up in Turing machines, register machines, tiling the plane, Conways game of life and many other places. There are weaker and stronger possibilities, like finite state machines, stack machines and oracle machines. (Ie a Turing machine with a magic black box that solves the halting problem is stronger than a normal Turing machine)

 

Human brains are finite state machines. A Turing machine has unlimited memory and time. 

Physical laws are generally continuous, but there exists a Turing machine that takes in a number N and computes the laws to accuracy 1/N. This isn't philosophically forced, but it seems to be the way things are. All serious theories are computable.

We could conceivably be in a universe that wasn't simulateable by a Turing machine. Assuming our brains are simulatable, we could never know this absolutely, as simulators with a huge but finite amount of compute trying to trick us could never be ruled out. 0 and 1 aren't probabilities and you are never certain. Still, we could conceivably be in a situation were an uncomputable explanation is far simpler than any computable theory. 

Thanks for the answer!

Human brains are finite state machines. A Turing machine has unlimited memory and time.

Oops! You're right, and It's something that I used to know. So IIRC as long your tape (and your time) is not infinite you still have a finite state machine, so Turing machines are kind of finite state machines taken to the limit for () is that right?

7tailcalled3y
Turing machines are a finite state machine that have access to a memory tape. This was intended to be sort of analogous to humans being able to take notes on unbounded amounts of paper when thinking.
2Donald Hobson3y
Turing machines are kind of the limit of finite state machines. There are particular Turing machines that can simulate all possible finite state machines. In general, an arbitrary sequence of finite state machines can be uncomputable. But there are particular primitive recursive functions that simulate arbitrary finite state machines, (given the value as an input. ) You don't need the full strength of turing completeness to do that. So I would say, kind of no. There are systems strictly stronger than all finite automaton, yet not Turing complete.  Really, the notion of a limit isn't rigorously defined here, so its hard to say.
1Morpheus3y
I fleshed out what I meant a bit more what I was imagining: I have a particular Turing machine (for example one that recognizes the language L:={anbn|n∈N} ) as long as you then limit the amount of transitions/time and the length of the string, then you could construct a finite state machine that reconizes the same language (for example L:={anbn|0≤n≤k∧k,n∈N} ). Naively, I'd imagine it to be possible for any particular Turing machine to construct an inductive rule how to construct the n+1-transition-finite state machine and the k+1-memory state machine from the n-time and k-memory state machine? Because in that case I'd imagine specifying a kind of "infinite state machine" by some (1-memory,1-time)-state machine and two induction rules how to extend the state machine as long as no termination state is reached.
1Morpheus2y
The language above can be recognized by a pushdown automaton or generated by a corresponding context-free grammar. I think past me had already heard about this idea, forgot about it and then was trying really hard to rederive something similar to it.

I'll focus on the gears-level elaboration of why all those computational mechanisms are equivalent. In short: If you want to actually get anything done, Turing machines suck! They're painful to use, and that makes it hard to get insight by experimenting with them. Lambda calculus / combinator calculi are way better for actually seeing why this general equivalence is probably true, and then afterwards you can link that back to TMs. (But, if at all possible, don't start with them!)

Sure, in some sense Turing machines are "natural" or "obvious": If you start from finite state machines and add a stack, you get a push-down automaton. Do it again, and you essentially have a Turing machine. (Each stack is one end of the infinite tape.) There's more bookkeeping involved to properly handle "underflows" (when you reach the bottom of a stack and need to extend it) yadda yadda, so every "logical state" would require a whole group of states to actually implement, but that's a boring mechanical transformation. So just slap another layer of abstraction on it and don't actually work with 2-stack PDAs, and you're golden, right? Right…?

Well… actually the same problem repeats again and again. TMs are just extremely low-level and not amenable to abstraction. By default, you get two tape symbols. If you need more, you can emulate those, which will require multiple "implementation states" to implement a single "logical state", so you add another layer of abstraction… Same with multiple tapes or whatever else you need. But you're still stuck in ugly implementation details, because you need to move the head left and right to skip across stuff, and you can't really abstract that away. (Likewise, chaining TMs or using one as a "subroutine" from another isn't easily possible and involves lots of manual "rewiring".) So except for trivial examples, actually working with Turing machines is extremely painful. It's not surprising that it's really hard to see why they should be able to run all effectively computable functions.

Enter lambda calculus. It's basically just functions, on a purely syntactic level. You have binders or (anonymous) function definitions – the eponymous lambdas. You have variables, and you have nested application of variables and terms (just like you're used to from normal functions.) There's exactly one rule: If a function is applied to an argument, you substitute that argument into the body wherever the placeholder variable was used. So e.g. (λx.((fx)y)(gx))z → ((fz)y)(gz) by substituting z for x. (Actual use of lambda calculus needs a few more rules: You have to make sure that you're not aliasing variables and changing the meaning, but that's a relatively minor (if tricky) bookkeeping problem that does abstract away fairly cleanly.)

Now, on top of that, you can quickly build a comfortable working environment. Want to actually name things? let x := y in t is just a ((λx.t)y). (The assignment being split around the whole code is a bit ugly, but you don't even need "actual" extensions, it's just syntax sugar.) Data can also be emulated quite comfortably, if you understand one key idea: Data is only useful because you eventually do something with it. If all you have is abstraction, just abstract out the "actually doing" part with a placeholder in the most general scheme that you could possibly need. (Generally an induction scheme or "fold".) [Note: I'll mostly use actual names instead of one-letter variables going forward, and may drop extra parentheses: f x y z = (((f x) y) z)]

  • In a way, Booleans are binary decisions. true := λtt.λff.tt, false := λtt.λff.ff (Now your if c then x else y just becomes a c x y… or you add some syntax sugar.)
  • In a way, natural numbers are iteration. zero := λze.λsu.ze, succ := λn.λze.λsu.su(n ze su). (If you look closely, that's just the induction principle. And all that the number "does" is: (succ (succ (succ zero)) x f) = f (f (f x)))
  • Lists are just natural numbers "with ornaments", a value dangling on every node. nil := λni.λco.ni, cons := λx.λxs.λni.λco.co x (xs ni co) The usual list functions are easy too: map := λf.λxs.(xs nil (λx.λxs.cons (f x) xs)), fold := λxs.xs (yep, the list – as represented – is the fold)

And so on… so I think you'll believe me that you basically have a full functional programming language, and that lambda calculus is equivalent to all those more usual computational substrates. (…and maybe Turing machines too?)

Still… even if lambda calculus is extremely simple in some ways, it's also disturbingly complex in others. Substitution everywhere inside a term is "spooky action at a distance", a priori it's quite plausible that this does a lot of "heavy lifting". So: Let's turn to combinator calculi. Let's keep the term structure and application of things, but get rid of all (object-level) variables and binders. Instead, we define three substitution rules: (using variables only at the meta-level)

  • I x = x
  • K v x = v (or, fully parenthesized: (K v) x = v)
  • S a b x = (a x) (b x) (or, again more precisely: ((S a) b) x = (a x) (b x))

(Strictly speaking, we don't even need I, as SKKx = (Kx)(Kx) = x, so if we say I := SKK that's equivalent.) The interesting thing about these rules is that all lambda-terms can be translated into SKI-terms. (To get rid of a binder / λ, you recursively transform the body: Pretend the intended argument x is already applied to the body, how do you get it where it needs to go? If you have an application, you prefix it with S such that the argument is passed down into both branches. If a branch doesn't use the argument (that includes other variables), prefix it with K so it remains unchanged. Otherwise, it is the bound variable that should be substituted, so put an I and you get the value. Done.) Worst case, this will cause an exponential(!) increase of the term size, but it'll get the job done. After that, there's no more "looking inside terms" or dealing with variables, but extremely simple rewrite rules. To revisit the examples above:

  • true := λtt.λff.tt → λtt.(K tt) → S(KK)I, false := λtt.λff.ff → λtt.(I) → KI
  • zero := λze.λsu.ze → S(KK)I, succ := λn.λze.λsu.su(n ze su) → λn.λze.SI(S(K(n ze))I) → λn.S(K(SI))(S(S(KS)(S(KK)(S(KK)(S(K n)I)))(KI)) → (S(K(S(K(SI))))(S(S(KS)(S(K(S(KS)))(S(K(S(KK)))(S(S(KS)(S(KK)I))(KI)))))(K(KI)))) […except I think I messed up somewhere.]
  • (And yes, I did that by hand… so I hope you'll accept that I stop there and don't translate the other examples.)

So actually you don't need lambda calculus (or variables, for that matter), just these two rules. You can compile your Haskell to this SKI-stuff.

Take a moment to process that.

Now I hope it's also pretty obvious that you can implement a Turing machine in Haskell? Same for imperative programming languages – look at e.g. the Whitespace interpreter (Haskell version in the archive, Idris version on Github). So at this point I think you have a pretty good intuition as for why all existing computational mechanisms can be reduced down to SKI-calculus.

As for the other direction…with a basic TM model, dealing with all those dynamically resizing parts would be extremely painful, but if you use an extended Turing machine that has lots of different symbols and also more than one band, it's actually not that hard? Detecting expandable locations isn't that hard if you use one band as scratch space for counting nested parentheses. Doing the "reductions" (which can actually greatly expand the space needed because S duplicates the argument) can also happen on a separate band, and then afterwards you copy over the stuff back onto the "main band", after adjusting the placement of the unchanged stuff on the side. (I hope that's sufficiently clear? This is basically using multiple bands as a fixed number of temporary helper variables.)

And that's it. Now you can run your Turing machine in your Haskell on your Turing machine, or whatever… Hope you're having fun!

Thanks for your reply! I had never heard of SKI-Calculus!

In short: If you want to actually get anything done, Turing machines suck! They're painful to use, and that makes it hard to get insight by experimenting with them.

I feel like one advantage Turing machines have, is that they are in some sense low level (actually quite close to how we build our computers)? They do seem to have a nice abstraction for memory. I would have no idea how to characterize NSPACE with lambda calculus, for example (though of course googling reveals someone seems to have figured it out).

3__nobody2y
They're actually quite different from how our computers work (just on the surface already, the program is unmodifiable and separate from the code, and the whole walking around is also a very important difference[1]), but I agree that they feel "electronics-y" / like a big ball of wires. Don't forget that most of the complexity theoretic definitions are based on Turing machines, because that happened to become the dominant model for the study of these kinds of things. Similar-but-slightly-different constructions would perhaps be more "natural" on lambda calculus, so if that had been the dominant model, then the Turing machine versions would have been slightly awkward instead. But memory isn't actually a problem – see Lisp. Represent everything as a cons cell (a storage unit with two slots) or a "symbol" (basically an unmodifiable string that compares in O(1)), where cons cells nest arbitrarily. (Simplest cell representation is two pointers and maybe a few tag bits to decide what it represents… optionally plus garbage collector stuff. Very close to how our computers operate!) Variable names become symbols, application becomes a cell, and lambdas can either become a specially tagged cell, or a nested construction like (lambda ('var body)) (doesn't really make a difference memory-wise, just a constant factor.) Time is also easy – just count the number of reductions, or maybe include the number of substitutions if you think that's important. (But again, constant factor, doesn't actually matter.) You could even argue that the problem goes in the other direction: For Turing machines, you largely don't care about the states (often you don't even explicitly construct them, just argue that they could be constructed), whereas lambda calculus programs are routinely written out in full. So had lambda calculus become the basis, then program size / "problem description complexity", the "cost of abstractions", reusability and other related things might have been considered more
1Morpheus2y
These are good points. From my understanding of how processors work, it seems one would get most of the benefits you mention by having addresses/absolute locations (and thus being able to use pointers [by using addresses instead of left/right operations]). Does that ring true to you?
1Morpheus2y
I think I got something about what you said, that in a certain sense we don't care about state with a Turing machine. It just occurred to me that the only reason we can use these more convenient designs in a processor is because its memory is limited. (Not sure if this is true? There is probably a really awkward way you could encode addresses, even if your memory were infinite? You'd probably do some tree-like thing, and figuring out where you actually want to put stuff would take some time.)

I think it is just so that once you hit a level of "systematicity" you are not missing out anything. It is not that the entities referenced when defining a turing machine are especially powerful. If there were "magic symbols" one could imagine that if a mathematician just scribbled the right sigils then a leaps better outcome could result. If you disregards what the mathematics is about it is black marks on a white background. What makes it "systematic" or "mathematical" is that the black marks give sufficient guidance on how to make more marks. One could try to relax this a bit. It is physically possible to write "2+2=5" (and it possible to make systems where "2+2=5" is a valid move, but in disntinguishing valid from invalid moves we are assuming away things). But if what you write next doesn't have to do what you have written before it becomes doubtful to call that a "result".

One important idea is mututal emulation. If the mathematician can write a story how a turing machine would throw around symbols and a turing machine can throw around symbols how a mathematcian might move a pen, you don't need to ask which capability is more "fundamental". If you have a result in one language, you can either get it by native operation or by the other method emulating the other.

There are shadows of there migth be interesting things that might go a little beyond computation or efficient operation. To put is provocatively if you have advice that has a "asspull" in it then that is not a valid algorithm. One example could be "1. Try a thing. 2. If it fails try another thing". One can turn this into a good algorithm with the flavor of "1. Enumerate all the possible answers 2. check each". For some mathematical tasks it might be that you just do something and something ends up working, there might not be a method to come up with mathematical discoveries. But with everything that has a method, it is a technology and can be followed by a "dumb" instruction follower, ie the homework is "fair". But some intelligent system can benefit from advice that is not a (complete) method, ie when following the instruction they need to do genuine creative/executive decicions such as "how to try a bunch of different things".

One might be tempted to think that good science results is a human encoutnering the worlds sense data and executing correct inductive inference on it. But then you have stuff like Schrödingers equation being just guessed (rather than being constructed from some hints in a understanble construction). Another story would be that people hallucinate all kinds of hypothesis and natural selection just leaves those that lucked out to be in correspondence with reality to be alive. But if there is a method to the madness the story is trackable and we can say about its properties. So any kind of system that can deal with system descriptions will be able to do all describable work (or repharasing emulate/employ any kind of madness that would be neccesary for particular outcomes).

I agree with most of this except for:

There are shadows of there migth be interesting things that might go a little beyond computation or efficient operation. To put is provocatively if you have advice that has a "asspull" in it then that is not a valid algorithm. One example could be "1. Try a thing. 2. If it fails try another thing". One can turn this into a good algorithm with the flavor of "1. Enumerate all the possible answers 2. check each". For some mathematical tasks it might be that you just do something and something ends up working, there might

... (read more)
2Slider2y
Its more about getting at it via negative definition by pointing to things that are not the thing in question. For a positive definition we could go something like "everything that can be made into a step-by-step list". Some of the negative "not steps:" 1. Neural networks when they get trained contain their training in weights. These are not the datatype "procedure". Yeah you use the in summation procedures. But it is hard to tell a story of it being a thought process or anything like that 2. In movie Tenet the protagonist is adviced to "don't try to understand it, feel it". One way this could be taken to the limit is "even if you had the most sophisticated and adaptive understanding possible you would fail at the task". Then the alternative object "intuition" might have different limits and possibilities than technological acruence. 3. Bugs are etymologically connected to the biological creastures because they made electrical connections that neither the bug or the circuit designers intended. Yet these machines had actually existed and operated. They could be desribed to be working "outside of procedure" even if doesn't need laws of physics to break down its in a certain sense specificationless. So a worry would be that some sort of computer could be impossible to design but could come about accidentally. Or that "designing" processes are bootstrapped with accidental proccess, you can think of a cell as a machine but the fist cell might involve dice rolls to get assembled which could not be thought as machines. 4. I think in turings paper when they specify that in principle a "dump clerk" could carry out a computation kind of take the stance that "dumbness" is a atypical state for a human to be in. But if all of computation is already within "dumbness" what is there beyond that? This might be something like "insight", a worker works according to external orders and designs but  autonomous agent defines his own creations/solutions. If we have 100 house builder

I dont think church-turing thesis is true, it is not true if general relativity is correct, paper for details; https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=a2ae95b5f4eb0c61c45522b2498004b16c8a19f8

"What if our brains just happen to be Turing machines?" Another way to ask this might be, "Is  consciousness computable?" Might I suggest another question? "Does it matter that Turing machines are self-contained; in other words, that they do not sense or interact with their environment?" Our brains do not only sense and interact with our environment, they also sense and control our own bodies. And our bodies, at numerous levels, down to our individual cells, sense and control their own status. Does it matter that the canonical Turing machine has no sensory apparatus, other than that which detects ones and zeroes on the program tape? What if it instead also sensed the texture of the tape, the current through the motors and relays, the battery or mains voltage powering it, the feel of the tape advancing through the reader, whether or not it was in good repair, clean or dusty, oiled or unoiled. What if it detected tape jams and other malfunctions, and could take measures to correct the problem? What if it could sense the temperature of the room it was in, and control it, to the degree that temperature affected its function? What if it developed a taste for tapes of a particular brand? What if it found some programs more interesting than others? What if it detected the presence of other Turing machines? What if it had to compete with other Turing machines for access to electrical current, or to tapes of better quality? What if it found another Turing machine that shared some of its tastes, in tapes, programs, and electrical sources, but differed in others, and they discussed their preferences, each expanding the other's appreciation for the resources they needed? Or found that they disagreed so profoundly, that they had to destroy the other before still other machines became infected with such bizarre programs? 

Could all this sensing, all these preferences, and all the control mechanisms, operate off that one tape, threading back and forth through the reader? 

Just phrasing these questions convinces me that the Turing machine model of consciousness fails, that consciousness is not an algorithm, and is not remotely computable. It also convinces me that consciousness is not programmable. It must always self-develop in, not just a brain, but a body that it can control, and use to affect the world it lives in. 

Our brains do not only sense and interact with our environment, they also sense and control our own bodies. And our bodies, at numerous levels, down to our individual cells, sense and control their own status.

I find this a bit confusing, I don't consciously control most of what's going on in my body: I don't have a sense of the status of my mitochondria or any individual cell in my body that aren't specifically developed for sensing. So how is this related to consciousness?

It also convinces me that consciousness is not programmable. It must always sel

... (read more)
1Dave Moore3y
"I don't consciously control most of what's going on in my body" Indeed not--and yet your brain controls many of these lowlevel functions--and is affected by them in turn. Your brain cells, for instance, are powered by mitochondria. "I don't find just phrasing these questions very convincing." I admit, they weren't really meant to be--to another person. But what went on in my head as I asked myself those questions persuaded me, my only claim.  "With these assumptions, operating from a single tape is not very limiting" Note that in my model, the operation of the tape reader itself is controlled by the tape, even to such things as clearing malfunctions. How deep does that rabbit hole go?  Randall's cartoon is amusing, but how well do rocks handle non-deterministic quantum processes? Is it really the rocks doing the computations, or are they just outputting the computations that the desert dweller performs in his head? Can he really keep the current machine state in his head? Can machines like that adequately simulate the interactions between different particles and systems within our universe? I think there's good reason to think that conscious beings do not arise in the universe the rock machine simulates.  One more point: another way of thinking about what I wrote is that I'm essentially asserting that an embodied consciousness is not, by itself, a finite state machine. It is merely one part of the entire universe in which it is embedded, which it senses, and is affected by in other ways, and which it affects. I also recommend the implications of Adrian Thompson's evolved tone discrimination circuit, implemented in a FPGA. Lacking a designed clock, the circuit instead used multiple feedback loops, which took a very long time to untangle. Also, the final version of the circuit employed elements that were not in the designed signal paths of the array. Though the array was intended to be digital, the evolved circuit used analog effects to communicate with the isol
3JBlack3y
All of the known laws of physics are computable to arbitrary precision by a Turing machine, including quantum decoherence. Even non-determinism and chaos aren't barriers, since you can simulate the state space of the system instead of just tracing one state. In the cartoon, the computations done are just enormous numbers of finite state transitions with a much smaller state space than even a four-function calculator. The inclusion of a human figure in the picture is just to make it relatable. The FPGA evolution experiment doesn't say anything at all about what can be simulated. That experiment can almost certainly be simulated, and similar experiments with similarly puzzling results have been simulated, and then shown to be valid in reality. One thing we can be sure of: anything that we can discover in finite time can be simulated. The Church-Turing thesis deals with behaviour of computation over unbounded times. All bounded computation is Turing-computable.