Realism about rationality

by Richard Ngo4 min read16th Sep 201850 comments

32

Law-ThinkingAIRationality
Curated

Epistemic status: trying to vaguely gesture at vague intuitions. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post.

There’s a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is.

Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: "momentum", "evolutionary fitness", and "intelligence". These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There's a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn't just because biologists haven't figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated "function" which basically requires you to describe that organism's entire phenotype, genotype and environment.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It's a mindset which makes the following ideas seem natural:

  • The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don't count brute force approaches like AIXI for the same reason I don't consider physics a simple yet powerful description of biology).
  • The idea that there is an “ideal” decision theory.
  • The idea that AGI will very likely be an “agent”.
  • The idea that Turing machines and Kolmogorov complexity are foundational for epistemology.
  • The idea that, given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints.
  • The idea that Aumann's agreement theorem is relevant to humans.
  • The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
  • The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors.
  • The idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).

To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they're all false. In fact, from a historical point of view I’m quite optimistic about using maths to describe things in general. But starting from that historical baseline, I’m inclined to adjust downwards on questions related to formalising intelligent thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks). It's true that "messy" human intelligence is able to generalise to a wide variety of domains it hadn't evolved to deal with, which supports rationality realism, but analogously an animal can be evolutionarily fit in novel environments without implying that fitness is easily formalisable.

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

Another gesture towards the thing: a popular metaphor for Kahneman and Tversky's dual process theory is a rider trying to control an elephant. Implicit in this metaphor is the localisation of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your CEV feel less well-defined? I claim that the latter perspective is just as sensible as the former, and perhaps even more so - see, for example, Paul Christiano's model of the mind, which leads him to conclude that "imagining conscious deliberation as fundamental, rather than a product and input to reflexes that actually drive behavior, seems likely to cause confusion."

These ideas have been stewing in my mind for a while, but the immediate trigger for this post was a conversation about morality which went along these lines:

R (me): Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.
O (a friend): You can’t just accept a contradiction! It’s like saying “I have an intuition that 51 is prime, so I’ll just accept that as an axiom.”
R: Morality isn’t like maths. It’s more like having tastes in food, and then having preferences that the tastes have certain consistency properties - but if your tastes are strong enough, you might just ignore some of those preferences.
O: For me, my meta-level preferences about the ways to reason about ethics (e.g. that you shouldn’t allow contradictions) are so much stronger than my object-level preferences that this wouldn’t happen. Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.
R: Actually, I think a much smarter agent could still be weirdly modular like humans are, and work in such a way that describing it as having “beliefs” is still a very lossy approximation. And it’s plausible that there’s no canonical way to “scale me up”.

I had a lot of difficulty in figuring out what I actually meant during that conversation, but I think a quick way to summarise the disagreement is that O is a rationality realist, and I’m not. This is not a problem, per se: I'm happy that some people are already working on AI safety from this mindset, and I can imagine becoming convinced that rationality realism is a more correct mindset than my own. But I think it's a distinction worth keeping in mind, because assumptions baked into underlying worldviews are often difficult to notice, and also because the rationality community has selection effects favouring this particular worldview even though it doesn't necessarily follow from the community's founding thesis (that humans can and should be more rational).

32

58 comments, sorted by Highlighting new comments since Today at 9:39 PM
New Comment

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it's trying to point at.

The main problem is the word "realism". It isn't clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.

I agree that there's something kind of like rationality realism. I just don't think this post successfully points at it.

Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.

So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it's more like fitness.

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum", and (2) doesn't depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy -- evolutionary biologists still see fitness as a very important subject, and don't seem to care that much about exactly how real the abstraction is.)

To the extent that this post has made a lot of people think that rationality realism is an important crux, it's quite plausible to me that it's made the discussion worse.

To expand more on (1) -- since it seems a lot of people found its negation plausible -- it seems like if there's an analogue for the theory of evolution, which uses relatively unreal concepts like "fitness" to help us understand rational agency, we'd like to know about it. In this view, MIRI-cluster is essentially saying "biologists should want to invent evolution. Look at all the similarities across different animals. Don't you want to explain that?" Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

ETA: The original version of this comment conflated "evolution" and "reproductive fitness", I've updated it now (see also my reply to Ben Pace's comment).

Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality.

MIRI in general and you in particular seem unusually (to me) confident that:

1. We can learn more than we already know about rationality of "ideal" agents (or perhaps arbitrary agents?).

2. This understanding will allow us to build AI systems that we understand better than the ones we build today.

3. We will be able to do this in time for it to affect real AI systems. (This could be either because it is unusually tractable and can be solved very quickly, or because timelines are very long.)

This is primarily based on what research you and MIRI do, some of MIRI's strategy writing, writing like the Rocket Alignment problem and law thinking, and an assumption that you are choosing to do this research because you think it is an effective way to reduce AI risk (given your skills).

(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy). I'd be interested in an argument for the three points listed above without realism about rationality (I agree with 1, somewhat agree with 2, and don't agree with 3).

If you don't have realism about rationality, then I basically agree with this sentence, though I'd rephrase it:

MIRI-cluster is essentially saying "biologists should want to invent evolution. Look at all the similarities across different animals. Don't you want to explain that?" Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

(ETA: In my head I was replacing "evolution" with "reproductive fitness"; I don't agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don't know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)

To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution (ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a "real" thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover "real" things that would then be important, but I don't think that's the claim.) My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

So I'd rephrase the sentence as: (ETA: changed the sentence a bit to talk about fitness instead of evolution)

MIRI-cluster is essentially saying "biologists should want to understand reproductive fitness. Look at all the similarities across different animals. Don't you want to explain that?" Whereas the non-MIRI cluster is saying "Yeah, it's a fascinating question to understand what makes animals fit, but given that we want to understand how antidepressants work, it is a better strategy to directly study what happens when an animal takes an antidepressant."

Which you could round off to "biologists don't need to know about reproductive fitness", in the sense that it is not the best use of their time.

ETA: I also have a model of you being less convinced by realism about rationality than others in the "MIRI crowd"; in particular, selection vs. control seems decidedly less "realist" than mesa-optimizers (which didn't have to be "realist", but was quite "realist" the way it was written, especially in its focus on search).

Huh? A lot of these points about evolution register to me as straightforwardly false. Understanding the theory of evolution moved us from "Why are there all these weird living things? Why do they exist? What is going on?" to "Each part of these organisms has been designed by a local hill-climbing process to maximise reproduction." If I looked into it, I expect I'd find out that early medicine found it very helpful to understand how the system was built. This is like me handing you a massive amount of code that has a bunch of weird outputs and telling you to make it work better and more efficiently, and the same thing but where I tell you what company made the code, why they made it, and how they made it, and loads of examples of other pieces of code they made in this fashion.

If I knew how to operationalise it I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.

A lot of these points about evolution register to me as straightforwardly false.

I don't know which particular points you mean. The only one that it sounds like you're arguing against is

he theory of evolution has not had nearly the same impact on our ability to make big things [...] I struggle to name a way that evolution affects an everyday person

Were there others?

I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.

I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are "real", in the sense that "real" is meant in the OP.) I don't think that an improved mathematical understanding of what makes particular animals more fit has had that much of an impact on anything.

Separately, I also think the general insight of "each part of these organisms has been designed by a local hill-climbing process to maximise reproduction" would not have been very influential in either medicine or biology, had it not been accompanied by the math (and assuming no one ever developed the math).

On reflection, my original comment was quite unclear about this, I'll add a note to it to clarify.

I do still stand by the thing that I meant in my original comment, which is that to the extent that you think rationality is like reproductive fitness (the claim made in the OP that Abram seems to agree with), where it is a very complicated mess of a function that we don't hope to capture in a simple equation; I don't think that improved understanding of that sort of thing has made much of an impact on our ability to do "big things" (as a proxy, things that affect normal people).

Within evolution, the claim would be that there has not been much impact from gaining an improved mathematical understanding of the reproductive fitness of some organism, or the "reproductive fitness" of some meme for memetic evolution.

I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are "real", in the sense that "real" is meant in the OP.)

In contrast, I think the general insight of "each part of these organisms has been designed by a local hill-climbing process to maximise reproduction" would not have been very influential in either medicine or biology, had it not been accompanied by the math.

But surely you wouldn't get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of 'reproductive fitness'.

But surely you wouldn't get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of 'reproductive fitness'.

Here is my understanding of what Abram thinks:

Rationality is like "reproductive fitness", in that it is hard to formalize and turn into hard math. Regardless of how much theoretical progress we make on understanding rationality, it is never going to turn into something that can make very precise, accurate predictions about real systems. Nonetheless, qualitative understanding of rationality, of the sort that can make rough predictions about real systems, is useful for AI safety.

Hopefully that makes it clear why I'm trying to imagine a counterfactual where the math was never developed.

It's possible that I'm misunderstanding Abram and he actually thinks that we will be able to make precise, accurate predictions about real systems; but if that's the case I think he in fact is "realist about rationality" and this post is in fact pointing at a crux between him and Richard (or him and me), though not as well as he would like.

(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.

(I agree with 1, somewhat agree with 2, and don't agree with 3).

It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).

I guess my position is something like this. I think it may be quite possible to make capabilities "blindly" -- basically the processing-power heavy type of AI progress (applying enough tricks so you're not literally recapitulating evolution, but you're sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.

So I believe in some kind of knowledge to be had (ie, point #1).

Yeah, so, taking stock of the discussion again, it seems like:

  • There's a thing-I-believe-which-is-kind-of-like-rationality-realism.
  • Points 1 and 2 together seem more in line with that thing than "rationality realism" as I understood it from the OP.
  • You already believe #1, and somewhat believe #2.
  • We are both pessimistic about #3, but I'm so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
  • We probably do have some disagreement about something like "how real is rationality?" -- but I continue to strongly suspect it isn't that cruxy.
(ETA: In my head I was replacing "evolution" with "reproductive fitness"; I don't agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don't know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)

I checked whether I thought the analogy was right with "reproductive fitness" and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

Sorry it resulted in a confusing mixed metaphor overall.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a "real" thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover "real" things that would then be important, but I don't think that's the claim.)

I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it's all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.

Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn't understand how organisms seeded on those planets would likely evolve.)

So -- it seems to me -- the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!

My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

I think we disagree primarily on 2 (and also how doomy the default case is, but let's set that aside).

In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

I think that's a crux between you and me. I'm no longer sure if it's a crux between you and Richard. (ETA: I shouldn't call this a crux, I wouldn't change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the "unreal rationality" world to be similar to what Daniel mentions below:

I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

Yeah, I'm going to try to give a different explanation that doesn't involve "realness".

When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are "levers", "gears", "nails", etc.

A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write "x + y", I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don't have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don't have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don't need to communicate all the caveats and intuitions that would accompany a leaky abstraction.

One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.

It's fine if there's some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness -- if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)

If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can't build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can't generalize it to something more far-off. Some examples from these comment threads of what "inferences about directly related things" looks like:

current theories about why England had an industrial revolution when it did
[biology] has far more practical consequences (thinking of medicine)
understanding why overuse of antibiotics might weaken the effect of antibiotics [based on knowledge of evolution]

Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say "overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic".

In contrast, for abstractions like "logic gates", "assembly language", "levers", etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you'd be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.

So now I'd go back and state our crux as:

Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?

I would guess not. It sounds like you would guess yes.

I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about "directly relevant things", which will probably let you say some interesting things about AI systems, just not very much. I'd expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that's precise enough to build hierarchies with.

(I think I'd also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)

(You might wonder why I'm optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is "directly relevant" to existing ML systems, and so you don't need to build hierarchies of abstraction -- just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

Your few assumptions need to talk about the system you actually build. On the model I'm outlining, it's hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.

I generally like the re-framing here, and agree with the proposed crux.

I may try to reply more at the object level later.

ETA: I also have a model of you being less convinced by realism about rationality than others in the "MIRI crowd"; in particular, selection vs. control seems decidedly less "realist" than mesa-optimizers (which didn't have to be "realist", but was quite "realist" the way it was written, especially in its focus on search).

Just a quick reply to this part for now (but thanks for the extensive comment, I'll try to get to it at some point).

It makes sense. My recent series on myopia also fits this theme. But I don't get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of "agency" into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of "optimization". (This applies to people at MIRI and also people outside MIRI.)

*The one person who has given me push-back is Scott.

In contrast, I struggle to name a way that evolution affects an everyday person

I'm not sure what exactly you mean, but examples that come to mind:

  • Crops and domestic animals that have been artificially selected for various qualities.
  • The medical community encouraging people to not use antibiotics unnecessarily.
  • [Inheritance but not selection] The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.
Crops and domestic animals that have been artificially selected for various qualities.

I feel fairly confident this was done before we understood evolution.

The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.

Also seems like a thing we knew before we understood evolution.

The medical community encouraging people to not use antibiotics unnecessarily.

That one seems plausible; though I'd want to know more about the history of how this came up. It also seems like the sort of thing that we'd have figured out even if we didn't understand evolution, though it would have taken longer, and would have involved more deaths.

Going back to the AI case, my takeaway from this example is that understanding non-real things can still help if you need to get everything right the first time. And in fact, I do think that if you posit a discontinuity, such that we have to get everything right before that discontinuity, then the non-MIRI strategy looks worse because you can't gather as much empirical evidence (though I still wouldn't be convinced that the MIRI strategy is the right one).

Ah, I didn't quite realise you meant to talk about "human understanding of the theory of evolution" rather than evolution itself. I still suspect that the theory of evolution is so fundamental to our understanding of biology, and our understanding of biology so useful to humanity, that if human understanding of evolution doesn't contribute much to human welfare it's just because most applications deal with pretty long time-scales.

(Also I don't get why this discussion is treating evolution as 'non-real': stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)

(Also I don't get why this discussion is treating evolution as 'non-real': stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)

Yeah, I agree, see my edits to the original comment and also my reply to Ben. Abram's comment was talking about reproductive fitness the entire time and then suddenly switched to evolution at the end; I didn't notice this and kept thinking of evolution as reproductive fitness in my head, and then wrote a comment based on that where I used the word evolution despite thinking about reproductive fitness and the general idea of "there is a local hill-climbing search on reproductive fitness" while ignoring the hard math.

Which you could round off to "biologists don't need to know about evolution", in the sense that it is not the best use of their time.

The most obvious thing is understanding why overuse of antibiotics might weaken the effect of antibiotics.

See response to Daniel below; I find this one a little compelling (but not that much).

My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

For what it's worth, I think I disagree with this even when "non-real" means "as real as the theory of liberalism". One example is companies - my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory.

(next paragraph is super political, but it's important to my point)

I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn't exactly mean 'moral goodness' but does imply the ability to support moral goodness - think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn't complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon.

At any rate, I think it's the case that the things that can be built off of these fake theories aren't reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it's possible to productively build off of them.

On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are "directly relevant".

If I agreed with the political example (and while I wouldn't say that myself, it's within the realm of plausibility), I'd consider that a particularly impressive version of this.

I'm confused how my examples don't count as 'building on' the relevant theories - it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that's true (and if the things in the real world actually successfully fulfilled their purpose), then I'd think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.

people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning

Agreed. I'd say they built things in the real world that were "one level above" their theories.

if that's true, [...] then I'd think that spending time and effort developing the relevant theories was worth it

Agreed.

you seem to be pointing at something else

Agreed.

Overall I think these relatively-imprecise theories let you build things "one level above", which I think your examples fit into. My claim is that it's very hard to use them to build things "2+ levels above".

Separately, I claim that:

  • "real AGI systems" are "2+ levels above" the sorts of theories that MIRI works on.
  • MIRI's theories will always be the relatively-imprecise theories that can't scale to "2+ levels above".

(All of this with weak confidence.)

I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don't know what you'd think of the first.

OK, I think I understand you now.

Overall I think these relatively-imprecise theories let you build things "one level above", which I think your examples fit into. My claim is that it's very hard to use them to build things "2+ levels above".

I think that I sort of agree if 'levels above' means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you're demanding which can independently break, which exponentially reduces the chance that you'll have no failure.

But also, to the extent that your theory is mathematisable and comes with 'error bars', you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I'd say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you'd prefer the initial levels be 'exact'.

"real AGI systems" are "2+ levels above" the sorts of theories that MIRI works on.

I'd say that they're some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I don't know how many levels of implementation that is).

When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised). They are also mathematised enough that I'm optimistic about upwards abstraction having the possibility of robustness. There are some exceptions (e.g. the mesa-optimisers paper), but they seem like they're on the path to greater mathematisability.

MIRI's theories will always be the relatively-imprecise theories that can't scale to "2+ levels above"

I'm not sure about this, but I disagree with the version that replaces 'MIRI's theories' with 'mathematical theories of embedded rationality', basically for the reasons that Vanessa discusses.

I disagree with the version that replaces 'MIRI's theories' with 'mathematical theories of embedded rationality'

Yeah, I think this is the sense in which realism about rationality is an important disagreement.

But also, to the extent that your theory is mathematisable and comes with 'error bars'

Yeah, I agree that this would make it easier to build multiple levels of abstractions "on top". I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where "tight" means "not so wide as to be useless"). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don't apply to the main successes of deep learning.

When I read a MIRI paper, it typically seems to me that the theories discussed are pretty abstract, and as such there are more levels below than above. [...] They are also mathematised enough that I'm optimistic about upwards abstraction having the possibility of robustness.

Agreed.

The levels below seem mostly unproblematic (except for machine learning, which in the form of deep learning is often under-theorised).

I am basically only concerned about machine learning, when I say that you can't build on the theories. My understanding of MIRI's mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: "MIRI's theory" -> "machine learning" -> "AGI". This isn't exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers.

You could imagine other stories of impact, and I'd have other questions about those, e.g. if the story was "MIRI's theory will tell us how to build aligned AGI without machine learning", I'd be asking when the theory was going to include computational complexity.

I like this review and think it was very helpful in understanding your (Abram's) perspective, as well as highlighting some flaws in the original post, and ways that I'd been unclear in communicating my intuitions. In the rest of my comment I'll try write a synthesis of my intentions for the original post with your comments; I'd be interested in the extent to which you agree or disagree.

We can distinguish between two ways to understand a concept X. For lack of better terminology, I'll call them "understanding how X functions" and "understanding the nature of X". I conflated these in the original post in a confusing way.

For example, I'd say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don't yet exist, or even prove things about those components (e.g. there's probably useful maths connecting graph theory with optimal nerve wiring), but it's still very grounded in concrete examples. If we thought that we should study how intelligence functions in a similar way as we study how fitness functions, that might look like a combination of cognitive science and machine learning.

By comparison, understanding the nature of X involves performing a conceptual reduction on X by coming up with a theory which is capable of describing X in a more precise or complete way. The pre-theoretic concept of fitness (if it even existed) might have been something like "the number and quality of an organism's offspring". Whereas the evolutionary notion of fitness is much more specific, and uses maths to link fitness with other concepts like allele frequency.

Momentum isn't really a good example to illustrate this distinction, so perhaps we could use another concept from physics, like electricity. We can understand how electricity functions in a lawlike way by understanding the relationship between voltage, resistance and current in a circuit, and so on, even when we don't know what electricity is. If we thought that we should study how intelligence functions in a similar way as the discoverers of electricity studied how it functions, that might involve doing theoretical RL research. But we also want to understand the nature of electricity (which turns out to be the flow of electrons). Using that knowledge, we can extend our theory of how electricity functions to cases which seem puzzling when we think in terms of voltage, current and resistance in circuits (even if we spend almost all our time still thinking in those terms in practice). This illustrates a more general point: you can understand a lot about how something functions without having a reductionist account of its nature - but not everything. And so in the long term, to understand really well how something functions, you need to understand its nature. (Perhaps understanding how CS algorithms work in practice, versus understanding the conceptual reduction of algorithms to Turing Machines, is another useful example).

I had previously thought that MIRI was trying to understand how intelligence functions. What I take from your review is that MIRI is first trying to understand the nature of intelligence. From this perspective, your earlier objection makes much more sense.

However, I still think that there are different ways you might go about understanding the nature of intelligence, and that "something kind of like rationality realism" might be a crux here (as you mention). One way that you might try to understand the nature of intelligence is by doing mathematical analysis of what happens in the limit of increasing intelligence. I interpret work on AIXI, logical inductors, and decision theory as falling into this category. This type of work feels analogous to some of Einstein's thought experiments about the limit of increasing speed. Would it have worked for discovering evolution? That is, would starting with a pre-theoretic concept of fitness and doing mathematical analysis of its limiting cases (e.g. by thinking about organisms that lived for arbitrarily long, or had arbitrarily large numbers of children) have helped people come up with evolution? I'm not sure. There's an argument that Malthus did something like this, by looking at long-term population dynamics. But you could also argue that the key insights leading up to the discovery evolution were primarily inspired by specific observations about the organisms around us. And in fact, even knowing evolutionary theory, I don't think that the extreme cases of fitness even make sense. So I would say that I am not a realist about "perfect fitness", even though the concept of fitness itself seems fine.

So an attempted rephrasing of the point I was originally trying to make, given this new terminology, is something like "if we succeed in finding a theory that tells us the nature of intelligence, it still won't make much sense in the limit, which is the place where MIRI seems to be primarily studying it (with some exceptions, e.g. your Partial Agency sequence). Instead, the best way to get that theory is to study how intelligence functions."

The reason I called it "rationality realism" not "intelligence realism" is that rationality has connotations of this limit or ideal existing, whereas intelligence doesn't. You might say that X is very intelligent, and Y is more intelligent than X, without agreeing that perfect intelligence exists. Whereas when we talk about rationality, there's usually an assumption that "perfect rationality" exists. I'm not trying to argue that concepts which we can't formalise "aren't real", but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can't formalise, and that it's those incoherent extrapolations like "perfect fitness" which "aren't real" (I agree that this was quite unclear in the original post).

My proposed redefinition:

  • The "intelligence is intelligible" hypothesis is about how lawlike the best description of how intelligence functions will turn out to be.
  • The "realism about rationality" hypothesis is about how well-defined intelligence is in the limit (where I think of the limit of intelligence as "perfect rationality", and "well-defined" with respect not to our current understanding, but rather with respect to the best understanding of the nature of intelligence we'll ever discover).

So, yeah, one thing that's going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)

But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.

  • The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
  • Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.

So to a large extent I think my recent direction can be seen as continuing a theme already present -- perhaps you might say I'm trying to properly learn the lesson of logical induction.

But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.

So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.

Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)

(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)

I'll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is "to start with idealized rationality and try to drag it down to Earth rather than the other way around". If the starting point is incoherent, then this approach doesn't seem like it'll go far - if AIXI isn't useful to study, then probably AIXItl isn't either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).

I appreciate that this isn't an argument that I've made in a thorough or compelling way yet - I'm working on a post which does so.

If the starting point is incoherent, then this approach doesn't seem like it'll go far - if AIXI isn't useful to study, then probably AIXItl isn't either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).

Hm. I already think the starting point of Bayesian decision theory (which is even "further up" than AIXI in how I am thinking about it) is fairly useful.

  • In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as 'utility' (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn't always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations.
  • Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc.

AIXI adds to all this the idea of quantifying Occam's razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we're going to disagree on.

As for AIXItl, I think it's sort of taking the wrong approach to "dragging things down to earth". Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.

Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I'm not sure he qualifies as a "miri person")

  • I believe in some form of rationality realism: that is, that there's a neat mathematical theory of ideal rationality that's in practice relevant for how to build rational agents and be rational. I expect there to be a theory of bounded rationality about as mathematically specifiable and neat as electromagnetism (which after all in the real world requires a bunch of materials science to tell you about the permittivity of things).
  • If I didn't believe the above, I'd be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my 'worldview' related to AI.
  • Searching for beliefs I hold for which 'rationality realism' is crucial by imagining what I'd conclude if I learned that 'rationality irrealism' was more right:
    • I'd be more interested in empirical understanding of deep learning and less interested in an understanding of learning theory.
    • I'd be less interested in probabilistic forecasting of things.
    • I'd want to find some higher-level thing that was more 'real'/mathematically characterisable, and study that instead.
    • I'd be less optimistic about the prospects for an 'ideal' decision and reasoning theory.
  • My research depends on the belief that rational agents in the real world are likely to have some kind of ordered internal structure that is comprehensible to people. This belief is informed by rationality realism but distinct from it.
[This comment is no longer endorsed by its author]Reply

To answer the easy part of this question/remark, I don't work at MIRI and don't research agent foundations, so I think I shouldn't count as a "MIRI person", despite having good friends at MIRI and having interned there.

(On a related note, it seems to me that the terminology "MIRI person"/"MIRI cluster" obscures intellectual positions and highlights social connections, which makes me wish that it was less prominent.)

I guess the main thing I want is an actual tally on "how many people definitively found this post to represent their crux", vs "how many people think that this represented other people's cruxes"

If I believed realism about rationality, I'd be closer to buying what I see as the MIRI story for impact. It's hard to say whether I'd actually change my mind without knowing the details of what exactly I'm updating to.

I think that ricraz claims that it's impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the "momentum vs. fitness" comparison doesn't make sense to me. Specifically, a concept doesn't have to be crisply well-defined in order to use it in mathematical models. Even momentum, which is truly one of the "cripser" concepts in science, is no longer well-defined when spacetime is not asymptotically flat (which it isn't). Much less so are concepts such as "atom", "fitness" or "demand". Nevertheless, physicists, biologist and economists continue to successfully construct and apply mathematical models grounded in such fuzzy concepts. Although in some sense I also endorse the "strawman" that rationality is more like momentum than like fitness (at least some aspects of rationality).

Although in some sense I also endorse the "strawman" that rationality is more like momentum than like fitness (at least some aspects of rationality).

How so?

I think that ricraz claims that it's impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the "momentum vs. fitness" comparison doesn't make sense to me.

Well, it's not entirely clear. First there is the "realism" claim, which might even be taken in contrast to mathematical abstraction; EG, "is IQ real, or is it just a mathematical abstraction"? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where "accurate" means, at least in part, helpfulness in making real predictions).

So the idea seems to be that there's a spectrum with physics at one extreme end. I'm not quite sure what goes at the other extreme end. Here's one possibility:

  • Physics
  • Chemistry
  • Biology
  • Psychology
  • Social Sciences
  • Humanities

A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, "realness" vs "mathematical modelability". Well, it's not clear exactly what that second axis should be.

Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect "reproductive fitness" levels rather than "momentum" levels.

Hmm, actually, I guess there's a tricky interpretational issue here, which is what it means to model agency exactly.

  • On the one hand, I fully believe in Eliezer's idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
  • But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.

I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.

Yeah, I should have been much more careful before throwing around words like "real". See the long comment I just posted for more clarification, and in particular this paragraph:

I'm not trying to argue that concepts which we can't formalise "aren't real", but rather that some concepts become incoherent when extrapolated a long way, and this tends to occur primarily for concepts which we can't formalise, and that it's those incoherent extrapolations which "aren't real" (I agree that this was quite unclear in the original post).

It seems almost tautologically true that you can't accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent.

What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of qualitative properties. Whether that's closer to "momentum" or "fitness", I'm not sure the question is even meaningful.

I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn't tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in then you know that it's possible to gain a lot from parallelization. If the problem is in then at least you can test solutions, et cetera.)

And also, abstract theory of alignment should be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that the assumptions of the theory hold in the real system.)

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum"

Eliezer is a fan of law thinking, right? Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we're confused), but I'm skeptical of MIRI because they seem more confused than average to me.

In this essay, ricraz argues that we shouldn't expect a clean mathematical theory of rationality and intelligence to exist. I have debated em about this, and I continue to endorse more or less everything I said in that debate. Here I want to restate some of my (critical) position by building it from the ground up, instead of responding to ricraz point by point.

When should we expect a domain to be "clean" or "messy"? Let's look at everything we know about science. The "cleanest" domains are mathematics and fundamental physics. There, we have crisply defined concepts and elegant, parsimonious theories. We can then "move up the ladder" from fundamental to emergent phenomena, going through high energy physics, molecular physics, condensed matter physics, biology, geophysics / astrophysics, psychology, sociology, economics... On each level more "mess" appears. Why? Occam's razor tells us that we should prioritize simple theories over complex theories. But, we shouldn't expect a theory to be more simple than the specification of the domain. The general theory of planets should be simpler than a detailed description of planet Earth, the general theory of atomic matter should be simpler than the theory of planets, the general theory of everything should be simpler than the theory of atomic matter. That's because when we're "moving up the ladder", we are actually zooming in on particular phenomena, and the information we need to specify "where to zoom in" is translated to the description complexity of theory.

What does it mean in practice about understanding messy domains? The way science solves this problem is by building a tower of knowledge. In this tower, each floor benefits from the interactions both with the floor above it and the floor beneath it. Without understanding macroscopic physics we wouldn't figure out atomic physics, and without figuring out atomic physics we wouldn't figure out high energy physics. This is knowledge "flowing down". But knowledge also "flows up": knowledge of high energy physics allows understanding particular phenomena in atomic physics, knowledge of atomic physics allows predicting the properties of materials and chemical reactions. (Admittedly, some floors in the tower we have now are rather ramshackle, but I think that ultimately the "tower method" succeeds everywhere, as much as success is possible at all).

How does mathematics come in here? Importantly, mathematics is not used only on the lower floors of the tower, but on all floors. The way "messiness" manifests is, the mathematical models for the higher floors are either less quantitatively accurate (but still contain qualitative inputs) or have a lot of parameters that need to be determined either empirically, or using the models of the lower floors (which is one way how knowledge flows up), or some combination of both. Nevertheless, scientists continue to successfully build and apply mathematical models even in "messy" fields like biology and economics.

So, what does it all mean for rationality and intelligence? On what floor does it sit? In fact, the subject of rationality of intelligence is not a single floor, but its own tower (maybe we should imagine science as a castle with many towers connected by bridges).

The foundation of this tower should be the general abstract theory of rationality. This theory is even more fundamental than fundamental physics, since it describes the principles from which all other knowledge is derived, including fundamental physics. We can regard it as a "theory of everything": it predicts everything by making those predictions that a rational agent should do. Solomonoff's theory and AIXI are a part of this foundation, but not all it. Considerations like computational resource constraints should also enter the picture: complexity theory teaches us that they are also fundamental, they don't requiring "zooming in" a lot.

But, computational resource constrains are only entirely natural when they are not tied to a particular model of computation. This only covers constraints such as "polynomial time" but not constraints such as time and even less so time. Therefore, once we introduce a particular model of computation (such as a RAM machine), we need to build another floor in the tower, one that will necessarily be "messier". Considering even more detailed properties of the hardware we have, the input/output channels we have, the goal system, the physical environment and the software tools we employ will correspond to adding more and more floors.

Once we agree that it shoud be possible to create a clean mathematical theory of rationality and intelligence, we can still debate whether it's useful. If we consider the problem of creating aligned AGI from an engineering perspective, it might seem for a moment that we don't really need the bottom layers. After all, when designing an airplane you don't need high energy physics. Well, high energy physics might help indirectly: perhaps it allowed predicting some exotic condensed matter phenomenon which we used to make a better power source, or better materials from which to build the aircraft. But often we can make do without those.

Such an approach might be fine, except that we also need to remember the risks. Now, safety is part of most engineering, and is definitely a part of airplane design. What level of the tower does it require? It depends on the kind of risks you face. If you're afraid the aircraft will not handle the stress and break apart, then you need mechanics and aerodynamics. If you're afraid the fuel will combust and explode, you better know chemistry. If you're afraid a lightning will strike the aircraft, you need knowledge of meteorology and electromagnetism, possibly plasma physics as well. The relevant domain of knowledge, and the relevant floor in the tower is a function of the nature of the risk.

What level of the tower do we need to understand AI risk? What is the source of AI risk? It is not in any detailed peculiarities of the world we inhabit. It is not in the details of the hardware used by the AI. It is not even related to a particular model of computation. AI risk is the result of Goodhart's curse, an extremely general property of optimization systems and intelligent agents. Therefore, addressing AI risk requires understanding the general abstract theory of rationality and intelligence. The upper floors will be needed as well, since the technology itself requires the upper floors (and since we're aligning with humans, who are messy). But, without the lower floors the aircraft will crash.

Although I don't necessarily subscribe to the precise set of claims characterized as "realism about rationality", I do think this broad mindset is mostly correct, and the objections outlined in this essay are mostly wrong.

There’s a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn’t just because biologists haven’t figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated “function” which basically requires you to describe that organism’s entire phenotype, genotype and environment.

This seems entirely wrong to me. Evolution definitely should be studied using mathematical models, and although I am not an expert in that, AFAIK this approach is fairly standard. "Fitness" just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a "descendant". The latter is not as unambiguously defined as "momentum" but under normal conditions it is quite precise. The actual structure and dynamics of biological organisms and their environment is very complicated, but this does not preclude the abstract study of evolution, i.e. understanding which sort of dynamics are possible in principle (for general environments) and in which way they depend on the environment etc. Applying this knowledge to real-life evolution is not trivial (and it does require a lot of complementary empirical research), as is the application of theoretical knowledge in any domain to "messy" real-life examples, but that doesn't mean such knowledge is useless. On the contrary, such knowledge is often essential to progress.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It’s a mindset which makes the following ideas seem natural: The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/​or intelligence in general. (I don’t count brute force approaches like AIXI for the same reason I don’t consider physics a simple yet powerful description of biology)...

I wonder whether the OP also doesn't count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology. Indeed, it's hard to imagine we would achieve the modern level of understanding chemistry without understanding at least non-relativistic quantum mechanics, and it's hard to imagine we would make much headway in molecular biology without chemistry, thermodynamics et cetera.

This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks).

Once again, the OP uses the concept of "messiness" in a rather ambiguous way. It is true that human and animal intelligence is "messy" in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.

The disagreement here seems to revolve around the question of, when should we expect to have a simple theory for a given phenomenon (i.e. when does Occam's razor apply)? It seems clear that we should expect to have a simple theory of e.g. fundumental physics, but not a simple equation for the coastline of Africa. The difference is, physics is a unique object that has a fundumental role, whereas Africa is just one arbitrary continent among the set of all continents on all planets in the universe throughout its lifetime and all Everett branches. Therefore, we don't expect a simple description of Africa, but we do expect a relatively simple description of planetary physics that would tell us which continent shapes are possible and which are more likely.

Now, "rationality" and "intelligence" are in some sense even more fundumental than physics. Indeed, rationality is what tells us how to form correct beliefs, i.e. how to find the correct theory of physics. Looking an anthropic paradoxes, it is even arguable that making decisions is even more fundumental than forming beliefs (since anthropic paradoxes are situations in which assigning subjective probabilities seems meaningless but the correct decision is still well-defined via "functional decision theory" or something similar). Therefore, it seems like there has to be a simple theory of intelligence, even if specific instances of intelligence are complex by virtue of their adaptation to specific computational hardware, specific utility function (or maybe some more general concept of "values"), somewhat specific (although still fairly diverse) class of environments, and also by virtue of arbitrary flaws in their design (that are still mild enough to allow for intelligent behavior).

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

This line of thought would benefit from more clearly delineating descriptive versus prescriptive. The question we are trying to answer is: "if we build a powerful goal-oriented agent, what goal system should we give it?" That is, it is fundamentally a prescriptive rather than descriptive question. It seems rather clear that the best choice of goal system would be in some sense similar to "human goals". Moreover, it seems that if possibilities A and B are such that is ill-defined whether humans (or at least, those humans that determine the goal system of the powerful agent) prefer A or B, then there is no moral significance to choosing between A and B in the target goal system. Therefore, we only need to determine "human goals" within the precision to which they are actually well-defined, not within absolute precision.

Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.

The question is not whether it is "fine". The question is, given a situation in which intuition A demands action X and intuition B demands action Y, what is the morally correct action? The answer might be "X", it might be "Y", it might be "both actions are equally good", or it might be even "Z" for some Z different from both X and Y. But any answer effectively determines a way to remove the contradiction, replacing it by a consistent overarching system. And, if we actually face that situation, we need to actually choose an answer.

It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence.

I used to think the same way, but the OP made me have a crisis of faith, and now I think the opposite way.

Sure, an animal brain solving an animal problem is messy. But a general purpose computer solving a simple mathematical problem can be just as messy. The algorithm for multiplying matrices in O(n^2.8) is more complex than the algorithm for doing it in O(n^3), and the algorithm with O(n^2.4) is way more complex than that. As I said in the other comment, "algorithms don't get simpler as they get better".

I don't know a lot about the study of matrix multiplication complexity, but I think that one of the following two possibilities is likely to be true:

  1. There is some and an algorithm for matrix multiplication of complexity for any s.t. no algorithm of complexity exists (AFAIK, the prevailing conjecture is ). This algorithm is simple enough for human mathematicians to find it, understand it and analyze its computational complexity. Moreover, there is a mathematical proof of its optimality that is simple enough for human mathematicians to find and understand.
  2. There is a progression of algorithms for lower and lower exponents that increases in description complexity without bound as the exponent approaches from above, and the problem of computing a program with given exponent is computationally intractable or even uncomputable. This fact has a mathematical proof that is simple enough for human mathematicians to find and understand.

Moreover, if we only care about having a polynomial time algorithm with some exponent then the solution is simple (and doesn't require any astronomical coefficients like Levin search; incidentally, the algorithm is also good enough for most real world applications). In either case, the computational complexity of matrix multiplication is understandable in the sense I expect intelligence to be understandable.

So, it is possible that there is a relatively simple and effective algorithm for intelligence (although I still expect a lot of "messy" tweaking to get a good algorithm for any specific hardware architecture; indeed, computational complexity is only defined up to a polynomial if you don't specify a model of computation), or it is possible that there is a progression of increasingly complex and powerful algorithms that are very expensive to find. In the latter case, long AGI timelines become much more probable since biological evolution invested an enormous amount of resources in the search which we cannot easily match. In either case, there should be a theory that (i) defines what intelligence is (ii) predicts how intelligence depends on parameters such as description complexity and computational complexity.

A good algorithm can be easy to find, but not simple in the other senses of the word. Machine learning can output an algorithm that seems to perform well, but has a long description and is hard to prove stuff about. The same is true for human intelligence. So we might not be able to find an algorithm that's as strong as human intelligence but easier to prove stuff about.

Machine learning uses data samples about an unknown phenomenon to extrapolate and predict the phenomenon in new instances. Such algorithms can have provable guarantees regarding the quality of the generalization: this is exactly what computational learning theory is about. Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks. And even so, there is already some progress. People have been making buildings and cannons before Newtonian mechanics, engines before thermodynamics and ways of using chemical reactions before quantum mechanics or modern atomic theory. The fact you can do something using trial and error doesn't mean trial and error is the only way to do it.

Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks.

I think "inherent mysteriousness" is also possible. Some complex things are intractable to prove stuff about.

I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.

Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation.

The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren't part of the territory itself.

Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by "physics" you, by definition, refer to the territory, then it seems to miss my point about Occam's razor. Occam's razor says that the map should be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff induction (which is a model of rationality) is the ultimate physical law: it is a mathematical rule of making predictions that generates all the other rules we can come up with. Such a point of view, although in some sense justified, at present would be impractical: this is because we know how to compute using actual physical models (including running computer simulations), but not so much using models of rationality. But this is just another way of saying we haven't constructed AGI yet.

I don't think it's meaningful to say that "weird physics may enable super Turing computation." Hypercomputation is just a mathematical abstraction. We can imagine, to a point, that we live in a universe which contains hypercomputers, but since our own brain is not a hypercomputer, we can never fully test such a theory. This IMO is the most fundumental significance of the Church-Turing thesis: since we only perceive the world through the lens of our own mind, then from our subjective point of view, the world only contains computable processes.

If your mind was computable but the external world had lots of seeming hypercomputation (e.g. boxes for solving the halting problem were sold on every corner and were apparently infallible), would you prefer to build an AI that used a prior over hypercomputable worlds, or an AI that used Solomonoff induction because it's the ultimate physical law?

What does it mean to have a box for solving the halting problem? How do you know it really solves the halting problem? There are some computable tests we can think of, but they would be incomplete, and you would only verify that the box satisfies those computable tests, not that is "really" a hypercomputer. There would be a lot of possible boxes that don't solve the halting problem that pass the same computable tests.

If there is some powerful computational hardware available, I would want the AI the use that hardware. If you imagine the hardware as being hypercomputers, then you can think of such an AI as having a "prior over hypercomputable worlds". But you can alternatively think of it as reasoning using computable hypotheses about the correspondence between the output of this hardware and the output of its sensors. The latter point of view is better, I think, because you can never know the hardware is really a hypercomputer.

Hmm, that approach might be ruling out not only hypercomputers, but also sufficiently powerful conventional computers (anything stronger than PSPACE maybe?) because your mind isn't large enough to verify their strength. Is that right?

In some sense, yes, although for conventional computers you might settle on very slow verification. Unless you mean that, your mind has only finite memory/lifespan and therefore you cannot verify an arbitrary conventional computer within any given credence, which is also true. Under favorable conditions, you can quickly verify something in PSPACE (using interactive proof protocols), and given extra assumptions you might be able to do better (if you have two provers that cannot communicate you can do NEXP, or if you have a computer whose memory you can reliably delete you can do an EXP-complete language), however it is not clear whether you can be justifiably highly certain of such extra assumptions.

See also my reply to lbThingrb.

This can’t be right ... Turing machines are assumed to be able to operate for unbounded time, using unbounded memory, without breaking down or making errors. Even finite automata can have any number of states and operate on inputs of unbounded size. By your logic, human minds shouldn’t be modeling physical systems using such automata, since they exceed the capabilities of our brains.

It’s not that hard to imagine hypothetical experimental evidence that would make it reasonable to believe that hypercomputers could exist. For example, suppose someone demonstrated a physical system that somehow emulated a universal Turing machine with infinite tape, using only finite matter and energy, and that this system could somehow run the emulation at an accelerating rate, such that it computed n steps in seconds. (Let’s just say that it resets to its initial state in a poof of pixie dust if the TM doesn’t halt after one second.)

You could try to reproduce this experiment and test it on various programs whose long-term behavior is predictable, but you could only test it on a finite (to say nothing of computable) set of such inputs. Still, if no one could come up with a test that stumped it, it would be reasonable to conclude that it worked as advertised. (Of course, some alternative explanation would be more plausible at first, given that the device as described would contradict well established physical principles, but eventually the weight of evidence would compel one to rewrite physics instead.)

One could hypothesize that the device only behaved as advertised on inputs for which human brains have the resources to verify the correctness of its answers, but did something else on other inputs, but you could just as well say that about a normal computer. There’d be no reason to believe such an alternative model, unless it was somehow more parsimonious. I don’t know any reason to think that theories that don’t posit uncomputable behavior can always be found which are at least as simple as a given theory that does.

Having said all that, I’m not sure any of it supports either side of the argument over whether there’s an ideal mathematical model of general intelligence, or whether there’s some sense in which intelligence is more fundamental than physics. I will say that I don’t think the Church-Turing thesis is some sort of metaphysical necessity baked into the concept of rationality. I’d characterize it as an empirical claim about (1) human intuition about what constitutes an algorithm, and (2) contingent limitations imposed on machines by the laws of physics.

It is true that a human brain is more precisely described as a finite automaton than a Turing machine. And if we take finite lifespan into account, then it's not even a finite automaton. However, these abstractions are useful models since they become accurate in certain asymptotic limits that are sufficiently useful to describe reality. On the other hand, I doubt that there is a useful approximation in which the brain is a hypercomputer (except maybe some weak forms of hypercomputation like non-uniform computation / circuit complexity).

Moreover, one should distinguish between different senses in which we can be "modeling" something. The first sense is the core, unconscious ability of the brain to generate models, and in particular that which we experience as intuition. This ability can (IMO) be thought of as some kind of machine learning algorithm, and, I doubt that hypercomputation is relevant there in any way. The second sense is the "modeling" we do by manipulating linguistic (symbolic) constructs in our conscious mind. These constructs might be formulas in some mathematical theory, including formulas that represent claims about uncomputable objects. However, these symbolic manipulations are just another computable process, and it is only the results of these manipulations that we use to generate predictions and/or test models, since this is the only access we have to those uncomputable objects.

Regarding your hypothetical device, I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC? (In particular, the latter could tell you that some Turing machine halts when it "really" doesn't, because in the model it halts after some non-standard number of computing steps.) More generally, given an uncomputable function and a system under test , there is no sequence of computable tests that will allow you to form some credence about the hypothesis s.t. this credence will converge to when the hypothesis is true and when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable ) when you can devise such a sequence of tests. (Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify is in the class, for example the class of all functions s.t. it is consistent with ZFC that is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself.)

My point is, the Church-Turing thesis implies (IMO) that the mathematical model of rationality/intelligence should be based on Turing machines at most, and this observation does not strongly depend on assumptions about physics. (Well, if hypercomputation is physically possible, and realized in the brain, and there is some intuitive part of our mind that uses hypercomputation in a crucial way, then this assertion would be wrong. That would contradict my own intuition about what reasoning is (including intuitive reasoning), besides everything we know about physics, but obviously this hypothesis has some positive probability.)

I didn't mean to suggest that the possibility of hypercomputers should be taken seriously as a physical hypothesis, or at least, any more seriously than time machines, perpetual motion machines, faster-than-light, etc. And I think it's similarly irrelevant to the study of intelligence, machine or human. But in my thought experiment, the way I imagined it working was that, whenever the device's universal-Turing-machine emulator halted, you could then examine its internal state as thoroughly as you liked, to make sure everything was consistent with the hypothesis that it worked as specified (and the non-halting case could be ascertained by the presence of pixie dust 🙂). But since its memory contents upon halting could be arbitrarily large, in practice you wouldn't be able to examine it fully even for individual computations of sufficient complexity. Still, if you did enough consistency checks on enough different kinds of computations, and the cleverest scientists couldn't come up with a test that the machine didn't pass, I think believing that the machine was a true halting-problem oracle would be empirically justified.

It's true that a black box oracle could output a nonstandard "counterfeit" halting function which claimed that some actually non-halting TMs do halt, only for TMs that can't be proved to halt within ZFC or any other plausible axiomatic foundation humans ever come up with, in which case we would never know that it was lying to us. It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases. For example, if it claimed that some actually non-halting TM M halted, we could feed it a program that emulated M and output the number of steps M took to halt. That program would also have to halt, and output some specific number n. In principle, we could then try emulating M for n steps on a regular computer, observe that M hadn't reached a halting state, and conclude that the device was lying to us. If n were large enough, that wouldn't be feasible, but it's a decisive test that a normal computer could execute in principle. I suppose my magical device could instead do something like leave an infinite output string in memory, that a normal computer would never know was infinite, because it could only ever examine finitely much of it. But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We'll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?

Actually, hypothesizing that my device "computed" a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable. A better skeptical hypothesis would be that the device passed off some actually halting TMs as non-halting, but only in cases where the shortest proof that any of those TMs would have halted eventually was too long for humans to have discovered yet. I don't know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis). Intuitively, though, it seems to me that, if you went long enough without finding proof that the device wasn't a true hypercomputer, continuing to insist that such proof would be found at some future time would start to sound like a God-of-the-gaps argument. I think this reasoning is valid even in a hypothetical universe in which human brains couldn't do anything Turing machines can't do, but other physical systems could. I admit that's a nontrivial, contestable conclusion. I'm just going on intuition here.

Nearly everything you said here was already addressed in my previous comment. Perhaps I didn't explain myself clearly?

It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases.

I wrote before that "I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC?"

So, the realization of a particular hypercomputer in a non-standard model of ZFC would pass all of your tests. You could examine its internal state or its output any way you like (i.e. ask any question that can be formulated in the language of ZFC) and everything you see would be consistent with ZFC. The number of steps for a machine that shouldn't halt would be a non-standard number, so it would not fit on any finite storage. You could examine some finite subset of its digits (either from the end or from the beginning), for example, but that would not tell you the number is non-standard. For any question of the form "is larger than some known number ?" the answer would always be "yes".

But finite resource bounds already prevent us from completely ruling out far-fetched hypotheses about even normal computers. We’ll never be able to test, e.g., an arbitrary-precision integer comparison function on all inputs that could feasibly be written down. Can we be sure it always returns a Boolean value, and never returns the Warner Brothers dancing frog?

Once again, there is a difference of principle. I wrote before that: "...given an uncomputable function and a system under test , there is no sequence of computable tests that will allow you to form some credence about the hypothesis s.t. this credence will converge to when the hypothesis is true and when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable ) when you can devise such a sequence of tests."

So, with normal computers you can become increasingly certain your hypothesis regarding the computer is true (even if you never become literally 100% certain, except in the limit), whereas with a hypercomputer you cannot.

Actually, hypothesizing that my device “computed” a nonstandard version of the halting function would already be sort of self-defeating from a standpoint of skepticism about hypercomputation, because all nonstandard models of Peano arithmetic are known to be uncomputable.

Yes, I already wrote that: "Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify is in the class, for example the class of all functions s.t. it is consistent with ZFC that is the halting function. But the verification would be extremely slow and relatively parsimonious competing hypotheses would remain plausible for an extremely (uncomputably) long time. In any case, notice that the class itself has, in some strong sense, a computable description: specifically, the computable verification procedure itself."

So, yes, you could theoretically become certain the device is a hypercomputer (although reaching high certainly would take very long time), without knowing precisely which hypercomputer it is, but that doesn't mean you need to add non-computable hypotheses to your "prior", since that knowledge would still be expressible as a computable property of the world.

I don’t know enough about Solomonoff induction to say whether it would unduly privilege such hypotheses over the hypothesis that the device was a true hypercomputer (if it could even entertain such a hypothesis).

Literal Solomonoff induction (or even bounded versions of Solomonoff induction) is probably not the ultimate "true" model of induction, I was just using it as a simple example before. The true model will allow expressing hypotheses such as "all the even-numbered bits in the sequence are ", which involve computable properties of the environment that do not specify it completely. Making this idea precise is somewhat technical.

"Fitness" just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly definable modulo the concept of a "descendant". The latter is not as unambiguously defined as "momentum" but under normal conditions it is quite precise.

Similarly, you can define intelligence as expected performance on a broad suite of tasks. However, what I was trying to get at with "define its fitness in terms of more basic traits" is being able to build a model of how it can or should actually work, not just specify measurement criteria.

I wonder whether the OP also doesn't count all of computational learning theory? Also, physics is definitely not a sufficient description of biology but on the other hand, physics is still very useful for understanding biology.

I do consider computational learning theory to be evidence for rationality realism. However, I think it's an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents - to my knowledge it hasn't played an important role in the success of deep learning, for instance. It may be analogous to mathematical models of evolution, which are certainly true but don't help you build better birds.

However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence. This is because the latter theory aims to describe mindspace as a whole rather than describing a particular rather arbitrary point inside it.
...
Now, "rationality" and "intelligence" are in some sense even more fundamental than physics... Therefore, it seems like there has to be a simple theory of intelligence.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it's not the case. Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns - something we're nowhere near able to formalise. I plan to write a follow-up post which describes my reasons for being skeptical about rationality realism in more detail.

We only need to determine "human goals" within the precision to which they are actually well-defined, not within absolute precision.

I agree, but it's plausible that they are much less well-defined than they seem. The more we learn about neuroscience, the more the illusion of a unified self with coherent desires breaks down. There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

...what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.

Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!

I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance.

It plays a minor role in deep learning, in the sense that some "deep" algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intelligence without actually creating the theory. However, the same could be said about any endeavor in history.

It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

Mathematical models of evolution might help you to build better evolutions. In order to build better birds, you would need mathematical models of birds, which are going to be much more messy.

This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it’s not the case.

I don't think it's a mere restatement? I am trying to show that "rationality realism" is what you should expect based on Occam's razor, which is a fundamental principle of reason. Possibly I just don't understand your position. In particular, I don't know what epistemology is like in the world you imagine. Maybe it's a subject for your next essay.

Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns

This seems to be confusing between objects and representations of objects. The assumption there is some mathematical theory at the core of human reasoning does not mean that a description of this mathematically theory should automatically exist in the conscious, symbol-manipulating part of the mind. You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.

There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

The response might depend on the framing if you're asked a question and given 10 seconds to answer it. If you're allowed to deliberate on the question, and in particular consider alternative framings, the answer becomes more well-defined. However, even if it is ill-defined, it doesn't really change anything. We can still ask the question "given the ability to optimize any utility function over the world now, what utility function should we choose?" Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.

It seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!
...
It may be analogous to mathematical models of evolution, which are certainly true but don’t help you build better birds.

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like "species will evolve faster when there are predators in their environment" and "species which use sexual reproduction will be able to adapt faster to novel environments". The analogous abstract theory of intelligence can tell us things like "agents will be less able to achieve their goals when they are opposed by other agents" and "agents with more compute will perform better in novel environments". These sorts of conclusions are not very useful for safety.

I don't think it's a mere restatement? I am trying to show that "rationality realism" is what you should expect based on Occam's razor, which is a fundamental principle of reason.

Sorry, my response was a little lazy, but at the same time I'm finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn't seem to me that this implies it's simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don't have time to think too much more about this now; will cover it in a follow-up.

You can have a reinforcement learning algorithm that is perfectly well-understood mathematically, and yet nowhere inside the state of the algorithm is a description of the algorithm itself or the mathematics behind it.

Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation - evidence which might be underweighted if people think of maths proofs as a representative example of reasoning.

We can still ask the question "given the ability to optimize any utility function over the world now, what utility function should we choose?" Perhaps it means that we need consider our answers to ethical questions provided a randomly generated framing. Or maybe it means something else. But in any case, it is a question that can and should be answered.

I'm going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I'm not sure there's any possible utility function which we'd actually be satisfied with maximising.

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

As a matter of fact, I emphatically do not agree. "Birds" are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a "theory of self-sustaining, self-replicating machines".

Let's consider a clearer example: cars. In order to build a car, it is very useful to have a theory of mechanics, chemistry, thermodynamic etc. Just doings things by trial and error would be much less effective, especially if you don't want the car to occasionally explode (given that the frequency of explosions might be too low to affordably detect during testing). This is not because a car is "simple": a spaceship or, let's say, a gravity wave detector is much more complex than a car, and yet you hardly need less theory to make one.

And another example: cryptography. In fact, cryptography is not so far from AI safety: in the former case, you defend against an external adversary whereas in the latter you defend against perverse incentives and subagents inside the AI. If we had this conversation in the 1960s (say), you might have said that cryptography is obviously a complex, messy domain, and theorizing about it is next to useless, or at least not helpful for designing actual encryption systems (there was Shannon's work, but since it ignored computational complexity you can maybe compare it to algorithmic information theory and statistical learning theory for AI today; if we had this conversation in the 1930s, then there would next to no theory at all, even though encryption was practiced since ancient times). And yet, today theory plays an essential role in this field. The domain actually is very hard: most of the theory relies on complexity theoretic conjectures that we are still far from being able to prove (although I expect that most theoretical computer scientists would agree that eventually we will solve them). However, even without being able to formally prove everything, the ability to reduce the safety of many different protocols to a limited number of interconnected conjectures (some of which have an abundance of both theoretical and empirical evidence) allows us to immensely increase our confidence in those protocols.

Similarly, I expect an abstract theory of intelligence to be immensely useful for AI safety. Even just having precise language to define what "AI safety" means would be very helpful, especially to avoid counter-intuitive failure modes like the malign prior. At the very least, we could have provably safe but impractical machine learning protocols that would be an inspiration to more complex algorithms about which we cannot prove things directly (like in deep learning today). More optimistically (but still realistically IMO) we could have practical algorithms satisfying theoretical guarantees modulo a small number of well-studied conjectures, like in cryptography today. This way, theoretical and empirical research could feed into each other, the whole significantly surpassing the sum of its parts.

I think it was important to have something like this post exist. However, I now think it's not fit for purpose. In this discussion thread, rohinmshah, abramdemski and I end up spilling a lot of ink about a disagreement that ended up being at least partially because we took 'realism about rationality' to mean different things. rohinmshah thought that irrealism would mean that the theory of rationality was about as real as the theory of liberalism, abramdemski thought that irrealism would mean that the theory of rationality would be about as real as the theory of population genetics, and I leaned towards rohinmshah's position but also thought that it referred to something more akin to a mood than a proposition. I think that a better post would distinguish these three types of 'realism' and their consequences. However, I'm glad that this post sparked enough conversation for the better post to become real.