Here is an exploration of what Eliezer Yudkowsky means when he writes about deep vs shallow patterns (although I’ll be using "knowledge" instead of "pattern" for reasons explained in the next section). Not about any specific pattern Yudkowsky is discussing, mind you, about what deep and shallow patterns are at all. In doing so, I don’t make any criticism of his ideas and instead focus on quoting him (seriously, this post is like 70% quotes) and interpreting him by finding the best explanation I can of his words (that still fit them, obviously). Still, there’s a risk that my interpretation misses some of his points and ideas— I’m building a lower-bound on his argument’s power that is as high as I can get, not an upper-bound. Also, I might just be completely wrong, in which case defer to Yudkowsky if he points out that I’m completely missing the point.

Thanks to Eliezer Yudkowsky, Steve Byrnes, John Wentworth, Connor Leahy, Richard Ngo, Kyle, Laria, Alex Turner, Daniel Kokotajlo and Logan Smith for helpful comments on a draft.

Back to the FOOM: Yudkowsky’s explanation

In recent discussions, Yudkowsky often talks about deep patterns and deep thinking. What he made clear in a comment on this draft is that he has been using the term “deep patterns” in two different ways:

  • What I’ll call deep knowledge, which is a form of human knowledge/theory as well as the related epistemic strategies. This is what I explore below.
  • What I’ll call deep cognition, which is the sort of deep patterns that Yudkowsky points out AGI would have. There’s a link and an analogy with the deep knowledge, but I don’t get it enough to write something convincing to me and Yudkowsky, so I’ll mostly avoid that topic in this post.

Focusing on deep knowledge then, Yudkowsky recently seems to ascribe his interlocutors’ failure to grasp his point to their inability to grasp different instances of deep knowledge.

(All quotes from Yudkowsky if not mentioned otherwise)

(From the first discussion with Richard Ngo)

In particular, just as I have a model of the Other Person's Beliefs in which they think alignment is easy because they don't know about difficulties I see as very deep and fundamental and hard to avoid, I also have a model in which people think "why not just build an AI which does X but not Y?" because they don't realize what X and Y have in common, which is something that draws deeply on having deep models of intelligence. And it is hard to convey this deep theoretical grasp.

That being said, he doesn’t really explain what this sort of deep knowledge is.

(From the same discussion with Ngo)

(Though it's something of a restatement, a reason I'm not going into "my intuitions about how cognition works" is that past experience has led me to believe that conveying this info in a form that the Other Mind will actually absorb and operate, is really quite hard and takes a long discussion, relative to my current abilities to Actually Explain things; it is the sort of thing that might take doing homework exercises to grasp how one structure is appearing in many places, as opposed to just being flatly told that to no avail, and I have not figured out the homework exercises.)

The thing is, he did exactly that in the FOOM debate with Robin Hanson 13 years ago. (For those unaware of this debate, Yudkoswky is responding to Hanson’s use of trends — like Moore’s law — extrapolations to think about intelligence explosion).

(From The Weak Inside View (2008))

Robin keeps asking me what I’m getting at by talking about some reasoning as “deep” while other reasoning is supposed to be “surface.” One thing which makes me worry that something is “surface” is when it involves generalizing a level N feature across a shift in level N−1 causes.

For example, suppose you say, “Moore’s Law has held for the last sixty years, so it will hold for the next sixty years, even after the advent of superintelligence” (as Kurzweil seems to believe, since he draws his graphs well past the point where you’re buying a billion times human brainpower for $1,000).

Now, if the Law of Accelerating Change were an exogenous, ontologically fundamental, precise physical law, then you wouldn’t expect it to change with the advent of superintelligence.

But to the extent that you believe Moore’s Law depends on human engineers, and that the timescale of Moore’s Law has something to do with the timescale on which human engineers think, then extrapolating Moore’s Law across the advent of superintelligence is extrapolating it across a shift in the previous causal generator of Moore’s Law.

So I’m worried when I see generalizations extrapolated across a change in causal generators not themselves described—i.e., the generalization itself is on the level of the outputs of those generators and doesn’t describe the generators directly.

If, on the other hand, you extrapolate Moore’s Law out to 2015 because it’s been reasonably steady up until 2008—well, Reality is still allowed to say, “So what?” to a greater extent than we can expect to wake up one morning and find Mercury in Mars’s orbit. But I wouldn’t bet against you, if you just went ahead and drew the graph.

So what’s “surface” or “deep” depends on what kind of context shifts you try to extrapolate past

An important subtlety here comes from the possible conflation of two uses of “surface”: the implicit use of “surface knowledge” as the consequences of some underlying causal processes/generator, and the explicit use of “surface knowledge” as drawing similarities without thinking about the causal process generating them. To simplify the discussion, let’s use the more modern idiom of “shallow” for the more explicit sense here.

So what is Yudkowsky pointing at? Two entangled things:

  • If you have shallow knowledge, that is a trend without an underlying causal model, then you can’t extend it when the causal process generating it changes. So if Moore’s law depends on “the timescale on which human engineers think”, we can’t extend it past the intelligence explosion, because then human engineers would be reply by AI engineers which would think faster.
  • If you have shallow knowledge, you can’t even know when to extend the trend safely because understanding when the underlying causal process changes is harder when you don’t know what the causal process is!

Imagine a restaurant that has a dish you really like. The last 20 times you went to eat there, the dish was amazing. So should you expect that the next time it will also be great? Well, that depends on whether anything in the kitchen changes. Because you don’t understand what makes the dish great, you don’t know of the most important aspects of the causal generators. So if they can’t buy their meat/meat-alternative at the same place, maybe that will change the taste; if the cook is replaced, maybe that will change the taste; if you go at a different time of the day, maybe that will change the taste.

You’re incapable of extending your trend (except by replicating all the conditions) to make a decent prediction because you don’t understand where it comes from. If on the other hand you knew why the dish was so amazing (maybe it’s the particular seasoning, or the chef’s touch), then now you can estimate its quality. But then you’re not using the trend, you’re using a model of the underlying causal process. 

Here is another phrasing by Yudkowsky from the same essay:

Though this is to some extent an argument produced after the conclusion, I would explain my reluctance to venture into quantitative futurism via the following trichotomy:

  • On problems whose pieces are individually precisely predictable, you can use the Strong Inside View to calculate a final outcome that has never been seen before—plot the trajectory of the first moon rocket before it is ever launched, or verify a computer chip before it is ever manufactured.
  • On problems that are drawn from a barrel of causally similar problems, where human optimism runs rampant and unforeseen troubles are common, the Outside View beats the Inside View. Trying to visualize the course of history piece by piece will turn out to not (for humans) work so well, and you’ll be better off assuming a probable distribution of results similar to previous historical occasions—without trying to adjust for all the reasons why this time will be different and better.
  • But on problems that are new things under the Sun, where there’s a huge change of context and a structural change in underlying causal forces, the Outside View also fails—try to use it, and you’ll just get into arguments about what is the proper domain of “similar historical cases” or what conclusions can be drawn therefrom. In this case, the best we can do is use the Weak Inside View—visualizing the causal process—to produce loose, qualitative conclusions about only those issues where there seems to be lopsided support.

More generally, these quotes point out to what Yudkowsky means when he says “deep knowledge”: the sort of reasoning that focuses on underlying causal models.

As he says himself:

To stick my neck out further: I am liable to trust the Weak Inside View over a “surface” extrapolation, if the Weak Inside View drills down to a deeper causal level and the balance of support is sufficiently lopsided.

Before going deeper into how such deep knowledge/Weak Inside View works and how to build confidence in it, I want to touch upon the correspondence between this kind of thinking and the Lucas Critique in macroeconomics. This link has been pointed out in the comments of the recent discussions — we thus shouldn’t be surprised that Yudkowsky wrote about it 8 years ago (yet I was surprised by this).

(From Intelligence Explosion Microeconomics (2013))

The “outside view” (Kahneman and Lovallo 1993) is a term from the heuristics and biases program in experimental psychology. A number of experiments show that if you ask subjects for estimates of, say, when they will complete their Christmas shopping, the right question to ask is, “When did you finish your Christmas shopping last year?” and not, “How long do you think it will take you to finish your Christmas shopping?” The latter estimates tend to be vastly over-optimistic, and the former rather more realistic. In fact, as subjects are asked to make their estimates using more detail—visualize where, when, and how they will do their Christmas shopping—their estimates become more optimistic, and less accurate. Similar results show that the actual planners and implementers of a project, who have full acquaintance with the internal details, are often much more optimistic and much less accurate in their estimates compared to experienced outsiders who have relevant experience of similar projects but don’t know internal details. This is sometimes called the dichotomy of the inside view versus the outside view. The “inside view” is the estimate that takes into account all the details, and the “outside view” is the very rough estimate that would be made by comparing your project to other roughly similar projects without considering any special reasons why this project might be different.

The Lucas critique (Lucas 1976) in economics was written up in 1976 when “stagflation”—simultaneously high inflation and unemployment—was becoming a problem in the United States. Robert Lucas’s concrete point was that the Phillips curve trading off unemployment and inflation had been observed at a time when the Federal Reserve was trying to moderate inflation. When the Federal Reserve gave up on moderating inflation in order to drive down unemployment to an even lower level, employers and employees adjusted their long-term expectations to take into account continuing inflation, and the Phillips curve shifted. Lucas’s larger and meta-level point was that the previously observed Phillips curve wasn’t fundamental enough to be structurally invariant with respect to Federal Reserve policy—the concepts of inflation and unemployment weren’t deep enough to describe elementary things that would remain stable even as Federal Reserve policy shifted.

and later in that same essay:

The lesson of the outside view pushes us to use abstractions and curves that are clearly empirically measurable, and to beware inventing new abstractions that we can’t see directly.

The lesson of the Lucas critique pushes us to look for abstractions deep enough to describe growth curves that would be stable in the face of minds improving in speed, size, and software quality.

You can see how this plays out in the tension between “Let’s predict computer speeds using this very well-measured curve for Moore’s Law over time—where the heck is all this other stuff coming from?” versus “But almost any reasonable causal model that describes the role of human thinking and engineering in producing better computer chips, ought to predict that Moore’s Law would speed up once computer-based AIs were carrying out all the research!”

This last sentence in particular points out another important feature of deep knowledge: that it might be easier to say negative things (like “this can’t work”) than precise positive ones (like “this is the precise law”) because the negative thing can be something precluded by basically all coherent/reasonable causal explanations, while they still disagree on the precise details.

Let’s dig deeper into that by asking more generally what deep knowledge is useful for.

How does deep knowledge work?

We now have a pointer (however handwavy) to what Yudkowsky means by deep knowledge. Yet we have very little details at this point about what this sort of thinking looks like. To improve that situation, the next two subsections explore two questions about the nature of deep knowledge: what is it for, and where does it come from?

The gist of this section is that:

  • Deep knowledge is primarily useful for saying what isn’t possible/what can’t work, especially in cases (like alignment) where there is very little data to draw from. (The comparison Yudkowsky keeps coming back to is how thermodynamics allows you to rule out perpetual motion machines)
  • Deep knowledge takes the form of compressed constraints on solution/hypothesis space, which have weight behind them because they let us rederive most of our current knowledge from basic/compressed ideas, and finding such compression without a strong entanglement with reality is incredibly hard. (Here an example used by Yudkowsky is the sort of thought experiments, conservation laws, and general ideas about what physical laws look like that guided Einstein in his path to Special and General Relativity)

What is deep knowledge useful for?

The big difficulty that comes up again and again, in the FOOM debate with Hanson and the discussion with Ngo and Christiano, is that deep knowledge doesn’t always lead to quantitative predictions. That doesn’t mean that the deep knowledge isn’t quantitative itself (expected utility maximization is an example used by Yudkowsky that is completely formal and quantitative), but that the causal model only partially constrains what can happen. That is, it doesn’t constrain enough to make precise quantitative predictions. 

Going back to his introduction of the Weak Outside view, recall that he wrote:

But on problems that are new things under the Sun, where there’s a huge change of context and a structural change in underlying causal forces, the Outside View also fails—try to use it, and you’ll just get into arguments about what is the proper domain of “similar historical cases” or what conclusions can be drawn therefrom. In this case, the best we can do is use the Weak Inside View—visualizing the causal process—to produce loose, qualitative conclusions about only those issues where there seems to be lopsided support.

He follows up writing:

So to me it seems “obvious” that my view of optimization is only strong enough to produce loose, qualitative conclusions, and that it can only be matched to its retrodiction of history, or wielded to produce future predictions, on the level of qualitative physics.

“Things should speed up here,” I could maybe say. But not “The doubling time of this exponential should be cut in half.”

I aspire to a deeper understanding of intelligence than this, mind you. But I’m not sure that even perfect Bayesian enlightenment would let me predict quantitatively how long it will take an AI to solve various problems in advance of it solving them. That might just rest on features of an unexplored solution space which I can’t guess in advance, even though I understand the process that searches.

Let’s summarize it that way: deep knowledge only partially constrains the surface phenomena it describes (which translate into quantitative predictions) and it takes a lot of detailed deep knowledge (and often data) to refine it enough to pin down exactly the phenomenon and make precise quantitative predictions. Alignment and AGI are fields where we don’t have that much deep knowledge, and the data is sparse, and thus we shouldn’t expect precise quantitative predictions anytime soon.

Of course, just because a prediction is qualitative doesn’t mean it comes from deep knowledge; all hand-waving isn’t wisdom. For a good criticism of shallow qualitative reasoning in alignment, let’s turn to Qualitative Strategies of Friendliness.

These then are three problems, with strategies of Friendliness built upon qualitative reasoning that seems to imply a positive link to utility:

The fragility of normal causal links when a superintelligence searches for more efficient paths through time;

The superexponential vastness of conceptspace, and the unnaturalness of the boundaries of our desires;

And all that would be lost, if success is less than complete, and a superintelligence squeezes the future without protecting everything of value in it.

The shallow qualitative reasoning criticized here relies too much on human common sense and superiority to the AI, when the situation to predict is about superintelligence/AGI. That is, this type of qualitative reasoning extrapolates across a change in causal generators.

On the other hand, Yudkowsky uses qualitative constraints to guide his criticism: he knows there’s a problem because the causal model forbids that kind of solution. Just like the laws of thermodynamics forbid perpetual motion machines.

Deep qualitative reasoning starts from the underlying (potentially quantitative) causal explanations and mostly tells you what cannot work or what cannot be done. That is, deep qualitative reasoning points out that a whole swatch of search space is not going to yield anything. A related point is that Yudkwosky rarely (AFAIK) makes predictions, even qualitative ones. He sometimes admits that he might do some, but it feels more like a compromise with the prediction-centered other person than what the deep knowledge is really for. Whereas he constantly points out how certain things cannot work.

(From Qualitative Strategies of Friendliness (2008))

In general, a lot of naive-FAI plans I see proposed, have the property that, if actually implemented, the strategy might appear to work while the AI was dumber-than-human, but would fail when the AI was smarter than human.  The fully general reason for this is that while the AI is dumber-than-human, it may not yet be powerful enough to create the exceptional conditions that will break the neat little flowchart that would work if every link operated according to the 21st-century First-World modal event.

This is why, when you encounter the AGI wannabe who hasn't planned out a whole technical approach to FAI, and confront them with the problem for the first time, and they say, "Oh, we'll test it to make sure that doesn't happen, and if any problem like that turns up we'll correct it, now let me get back to the part of the problem that really interests me," know then that this one has not yet leveled up high enough to have interesting opinions.  It is a general point about failures in bad FAI strategies, that quite a few of them don't show up while the AI is in the infrahuman regime, and only show up once the strategy has gotten into the transhuman regime where it is too late to do anything about it.

(From the second discussion with Ngo)

I live in a world where I proceed with very strong confidence if I have a detailed formal theory that made detailed correct advance predictions, and otherwise go around saying, "well, it sure looks like X, but we can be on the lookout for a miracle too".

If this was a matter of thermodynamics, I wouldn't even be talking like this, and we wouldn't even be having this debate.

I'd just be saying, "Oh, that's a perpetual motion machine. You can't build one of those. Sorry." And that would be the end.

(From Security Mindset and Ordinary Paranoia (2017))

You need to master two ways of thinking, and there are a lot of people going around who have the first way of thinking but not the second. One way I’d describe the deeper skill is seeing a system’s security as resting on a story about why that system is safe. We want that safety-story to be as solid as possible. One of the implications is resting the story on as few assumptions as possible; as the saying goes, the only gear that never fails is one that has been designed out of the machine.


There’s something to be said for redundancy, and having fallbacks in case the unassailable wall falls; it can be wise to have additional lines of defense, so long as the added complexity does not make the larger system harder to understand or increase its vulnerable surfaces. But at the core you need a simple, solid story about why the system is secure, and a good security thinker will be trying to eliminate whole assumptions from that story and strengthening its core pillars, not only scurrying around parrying expected attacks and putting out risk-fires.

Or my reading of the whole discussion with Christiano, which is that Christiano constantly tries to get Yudkowsky to make a prediction, but the latter focuses on aspects of Christiano’s model and scenario that don’t fit his (Yudkoswky’s) deep knowledge.

I especially like the perpetual motion machines analogy, because it drives home how just proposing a tweak/solution without understanding Yudkowsky’s deep knowledge (and what it would take for it to not apply) has almost no chance of convincing him. Because if someone said they built a perpetual motion machine without discussing how they bypass the laws of thermodynamics, every scientifically literate person would be doubtful. On the other hand, if they seemed to be grappling with thermodynamics and arguing for a plausible way of winning, you’d be significantly more interested.

(I feel like Bostrom’s Orthogonality Thesis is a good example of such deep knowledge in alignment that most people get, and I already argued elsewhere that it serves mostly to show that you can’t solve alignment by just throwing competence at it — also note that Yudkowsky had the same pattern earlier/parallely, and is still using it)

To summarize: the deep qualitative thinking that Yudkowsky points out by saying “deep knowledge” is the sort of thinking that cuts off a big chunk of possibility space, that is tells you the whole chunk cannot work. It also lets you judge from the way people propose a solution (whether they tackle the deep pattern or not) whether you should ascribe decent probability to them being right.

A last note in this section: although deep knowledge primarily leads to negative conclusions, it can also lead to positive knowledge through a particularly Bayesian mechanism: if the deep knowledge destroys every known hypothesis/proposal except one (or a small number of them), then that is strong evidence for the ones left.

(This quote is more obscure than the others without the context. It’s from Intelligence Explosion Microeconomics (2013), and discusses the last step in a proposal for formalizing the sort of deep insight/pattern Yudkowksy leveraged during the FOOM debate. If you’re very confused, I feel like the most relevant part to my point is the bold last sentence.)

If Step Three is done wisely—with the priors reflecting an appropriate breadth of uncertainty—and doesn’t entirely founder on the basic difficulties of formal statistical learning when data is scarce, then I would expect any such formalization to yield mostly qualitative yes-or-no answers about a rare handful of answerable questions, rather than yielding narrow credible intervals about exactly how the internal processes of the intelligence explosion will run. A handful of yeses and nos is about the level of advance prediction that I think a reasonably achievable grasp on the subject should allow—we shouldn’t know most things about intelligence explosions this far in advance of observing one—we should just have a few rare cases of questions that have highly probable if crude answers. I think that one such answer is “AI go FOOM? Yes! AI go FOOM!” but I make no pretense of being able to state that it will proceed at a rate of 120,000 nanofooms per second.

Even at that level, covering the model space, producing a reasonable simplicity weighting, correctly hooking up historical experiences to allow falsification and updating, and getting back the rational predictions would be a rather ambitious endeavor that would be easy to get wrong. Nonetheless, I think that Step Three describes in principle what the ideal Bayesian answer would be, given our current collection of observations. In other words, the reason I endorse an AI-go-FOOM answer is that I think that our historical experiences falsify most regular growth curves over cognitive investments that wouldn’t produce a FOOM.

Where does deep knowledge come from?

Now that we have a decent grounding of what Yudkowsky thinks deep knowledge is for, the biggest question is how to find it, and how to know you have found good deep knowledge. After all, maybe the causal models one assumes are just bad?

This is the biggest difficulty that Hanson, Ngo, and Christiano seemed to have with Yudkowsky’s position.

(Robin Hanson, from the comments after Observing Optimization in the FOOM Debate)

If you can’t usefully connect your abstractions to the historical record, I sure hope you have some data you can connect them to. Otherwise I can’t imagine how you could have much confidence in them.

(Richard Ngo from his second discussion with Yudkowsky)

Let me put it this way. There are certain traps that, historically, humans have been very liable to fall into. For example, seeing a theory, which seems to match so beautifully and elegantly the data which we've collected so far, it's very easy to dramatically overestimate how much that data favours that theory. Fortunately, science has a very powerful social technology for avoiding this (i.e. making falsifiable predictions) which seems like approximately the only reliable way to avoid it - and yet you don't seem concerned at all about the lack of application of this technology to expected utility theory.

(Paul Christiano from his discussion with Yudkowsky)

OK, but you keep saying stuff about how people with my dumb views would be "caught flat-footed" by historical developments. Surely to be able to say something like that you need to be making some kind of prediction?

Note that these attitudes make sense. I especially like Ngo’s framing. Falsifiable predictions (even just postdictions) are the cornerstone of evaluation hypotheses in Science. It even feels to Ngo (as it felt to me) that Yudkowsky argued for that in the Sequences:

(Ngo from his second discussion with Yudkowsky)

I'm familiar with your writings on this, which is why I find myself surprised here. I could understand a perspective of "yes, it's unfortunate that there are no advanced predictions, it's a significant weakness, I wish more people were doing this so we could better understand this vitally important theory". But that seems very different from your perspective here.

(And Yudkoswky himself from Making Belief Pay Rent (In Anticipated Experience))

Above all, don’t ask what to believe—ask what to anticipate. Every question of belief should flow from a question of anticipation, and that question of anticipation should be the center of the inquiry. Every guess of belief should begin by flowing to a specific guess of anticipation, and should continue to pay rent in future anticipations. If a belief turns deadbeat, evict it.

But the thing is… rereading part of the Sequences, I feel Yudkowsky was making points about deep knowledge all along? Even the quote I just used, which I interpreted in my rereading a couple of weeks ago as being about making predictions, now sounds like it’s about the sort of negative form of knowledge that forbids “perpetual motion machines”. Notably, Yudkowsky is very adamant that beliefs must tell you what cannot happen. Yet that doesn’t imply at all to make predictions of the form “this is how AGI will develop”, so much as saying things like “this approach to alignment cannot work”.

Also, should I point out that there’s a whole sequence dedicated to the ways rationality can do better than science? (Thanks to Steve Byrnes for the pointer). I’m also sure I would find a lot of relevant stuff by rereading Inadequate Equilibria too, but if I wait to have reread everything by Yudkowsky before posting, I’ll be there a long time…

My Initial Mistake and the Einstein Case

Let me jump here with my best guess of Yudkowsky’s justification of deep knowledge: their ability to both

  • strongly compress “what sort of hypothesis ends up being right” without having to add anything ad-hoc-y to get our theory and hypotheses back;
  • and constrain anticipations in non-trivial ways.

The thing is, I got it completely wrong initially. Reading Einstein’s Arrogance (2007), an early Sequences post that is all about saying that Einstein had excellent reasons to believe General Relativity’s correctness before experimental verification (of advanced predictions), I thought that relativity was the deep knowledge and that Yudkowsky was pointing out how Einstein, having found an instance of true deep knowledge, could allow himself to be more confident than the social process of Science would permit in the absence of experimental justification.

Einstein’s Speed (2008) made it clear that I had been looking at the moon when I was supposed to see the pointing finger: the deep knowledge Yudkowsky pointed out was not relativity itself, but what let Einstein single it out by a lot of armchair reasoning and better use of what was already known.

In our world, Einstein didn't even use the perihelion precession of Mercury, except for verification of his answer produced by other means.  Einstein sat down in his armchair, and thought about how he would have designed the universe, to look the way he thought a universe should look—for example, that you shouldn't ought to be able to distinguish yourself accelerating in one direction, from the rest of the universe accelerating in the other direction.

And Einstein executed the whole long (multi-year!) chain of armchair reasoning, without making any mistakes that would have required further experimental evidence to pull him back on track.

More generally, I interpret the whole Science and Rationality Sequence as explaining how deep knowledge can let rationalists do something that isn’t in the purview of traditional Science: estimate which hypotheses make sense before the experimental predictions and evidence come in.

(From Faster Than Science (2008))

This doesn't mean that the process of deciding which ideas to test is unimportant to Science.  It means that Science doesn't specify it.


In practice, there are some scientific queries with a large enough answer space, that picking models at random to test, it would take zillions of years to hit on a model that made good predictions—like getting monkeys to type Shakespeare.

At the frontier of science—the boundary between ignorance and knowledge, where science advances—the process relies on at least some individual scientists (or working groups) seeing things that are not yet confirmed by Science.  That's how they know which hypotheses to test, in advance of the test itself.

If you take your Bayesian goggles off, you can say, "Well, they don't have to know, they just have to guess."  If you put your Bayesian goggles back on, you realize that "guessing" with 10% probability requires nearly as much epistemic work to have been successfully performed, behind the scenes, as "guessing" with 80% probability—at least for large answer spaces.

The scientist may not know he has done this epistemic work successfully, in advance of the experiment; but he must, in fact, have done it successfully!  Otherwise he will not even think of the correct hypothesis.  In large answer spaces, anyway.

There’s a subtlety that is easy to miss: Yudkowsky doesn’t say that specifying an hypothesis in a large answer space makes it high evidence. After all, you can just generate any random guess. What he’s pointing at is that to ascribe a decent amount of probability to a specific hypothesis in a large space through updating on evidence, you need to cut a whole swath of the space to redirect the probability on your hypothesis. And that from a purely computational perspective, this implies more work on whittling down hypotheses than to make the favored hypothesis certain enough through experimental verification.

His claim then seems that Einstein, and other scientists who tended to “guess right” at what would be later experimentally confirmed, couldn’t have been just lucky — they must have found ways of whittling down the vastness of hypothesis space, so they had any chance of proposing something that was potentially right.

Yudkowsky gives some pointers to what he thinks Einstein was doing right.

(From Einstein’s Speed (2008))

Rather than observe the planets, and infer what laws might cover their gravitation, Einstein was observing the other laws of physics, and inferring what new law might follow the same pattern.  Einstein wasn't finding an equation that covered the motion of gravitational bodies.  Einstein was finding a character-of-physical-law that covered previously observed equations, and that he could crank to predict the next equation that would be observed.

Nobody knows where the laws of physics come from, but Einstein's success with General Relativity shows that their common character is strong enough to predict the correct form of one law from having observed other laws, without necessarily needing to observe the precise effects of the law.

(In a general sense, of course, Einstein did know by observation that things fell down; but he did not get GR by backward inference from Mercury's exact perihelion advance.)

So in that interpretation, Einstein learned from previous physics and from thought experiments how to cut away the parts of the hypothesis space that didn’t sound like they could make good physical laws, until he was left with a small enough subspace that he could find the right fit by hand (even if that took him 10 years)

So, from a Bayesian perspective, what Einstein did is still induction, and still covered by the notion of a simple prior (Occam prior) that gets updated by new evidence.  It's just the prior was over the possible characters of physical law, and observing other physical laws let Einstein update his model of the character of physical law, which he then used to predict a particular law of gravitation.

If you didn't have the concept of a "character of physical law", what Einstein did would look like magic—plucking the correct model of gravitation out of the space of all possible equations, with vastly insufficient evidence.  But Einstein, by looking at other laws, cut down the space of possibilities for the next law.  He learned the alphabet in which physics was written, constraints to govern his answer.  Not magic, but reasoning on a higher level, across a wider domain, than what a naive reasoner might conceive to be the "model space" of only this one law.

In summary, deep knowledge doesn’t come in the form of a particularly neat hypothesis or compression; it is the engine of compression itself. Deep knowledge compresses “what sort of hypothesis tends to be correct”, such that it can be applied to the search of a correct hypothesis at the object level. That also cements the idea that deep knowledge gives constraints, not predictions: you don’t expect to be able to have such a strong criterion for correct hypothesis that given a massive hypothesis space, you can pinpoint the correct one.

Here it is good to generalize my previous mistake; recall that I took General Relativity for the deep knowledge, when it was actually the sort of constraints on physical laws that Einstein used for even finding General Relativity. Why? I can almost hear Yudkowsky answering in my head: because General Relativity is the part accepted and acknowledged by Science. I don’t think it’s the only reason, but there’s an element of truth: I privileged the “proper” theory with experimental validation over the more vague principles and concepts that lead to it.

A similar mistake is to believe the deep knowledge is the theory when it actually is what the theory and the experiments unearthed. This is how I understand Yudkowsky’s use of thermodynamics and evolutionary biology: he points out at the deep knowledge that led and was revealed by the work on these theories, more than at the theories themselves.

Compression and Fountains of Knowledge

We still don’t have a good way of finding and checking deep knowledge, though. Not any constraint on hypothesis space is deep knowledge, or even knowledge at all. The obvious idea is to have a reason for that constraint. And the reason Yudkowsky goes for almost every time is compression. Not a compressed description, like Moore’s law; nor a “compression” that is as complex as the pattern of hypothesis it’s trying to capture. Compression in the sense that you get a simpler constraint that can get you most of the way to regenerate the knowledge you’re starting from.

This view of the importance of compression is everywhere in the Sequences. A great example is Truly Part of You, which asks what knowledge you could rederive if it was deleted from your mind. If you have a deep understanding of the subject, and you keep recursively asking how a piece of knowledge could be rederived and then how “what’s needed for the derivation” can be rederived, Yudkwosky argues that you will reach “fountains of knowledge”. Or in the terminology of this post, deep knowledge.

Almost as soon as I started reading about AI—even before I read McDermott—I realized it would be a really good idea to always ask myself: “How would I regenerate this knowledge if it were deleted from my mind?”

The deeper the deletion, the stricter the test. If all proofs of the Pythagorean Theorem were deleted from my mind, could I re-prove it? I think so. If all knowledge of the Pythagorean Theorem were deleted from my mind, would I notice the Pythagorean Theorem to re-prove? That’s harder to boast, without putting it to the test; but if you handed me a right triangle with sides of length 3 and 4, and told me that the length of the hypotenuse was calculable, I think I would be able to calculate it, if I still knew all the rest of my math.

What about the notion of mathematical proof? If no one had ever told it to me, would I be able to reinvent that on the basis of other beliefs I possess? There was a time when humanity did not have such a concept. Someone must have invented it. What was it that they noticed? Would I notice if I saw something equally novel and equally important? Would I be able to think that far outside the box?

How much of your knowledge could you regenerate? From how deep a deletion? It’s not just a test to cast out insufficiently connected beliefs. It’s a way of absorbing a fountain of knowledge, not just one fact.

What do these fountains look like? They’re not the fundamental theories themselves, but instead their underlying principles. Stuff like the principle of least action, Noether’s theorem and the principles underlying Statistical Mechanics (don’t know enough about it to name them). They are the crystallized insights which constrain enough the search space that we can rederive what we knew from them.

(Feynman might have agreed, given that he chose the atomic hypothesis/principle,  “all things are made of atomslittle particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another” was the one sentence he salvage for further generations in case of a cataclysm.)

Here I hear a voice in my mind saying “What does simple mean? Shouldn’t it be better defined?” Yet this doesn’t feel like a strong objection. Simple is tricky to define intensively, but scientists and mathematicians tend to be pretty good at spotting it, as long as they don’t fall for Mysterious Answers. And most of the checks on deep knowledge seem to be in their ability to rederive the known correct hypotheses without adding stuff during the derivation.

A final point before closing this section: Yudkowsky writes that the same sort of evidence can be gathered for more complex arguments if they can be summarized by simple arguments that still get most of the current data right. My understanding here is that he’s pointing at the wiggle room of deep knowledge, that is at the non-relevant ways in which it can be off sometimes. This is important because asking for that wiggle room can sound like ad-hoc adaptation of the pattern, breaking the compression assumption.

(From Intelligence Explosion Microeconomics (2013))

In my case, I think how much I trusted a Step Three model would depend a lot on how well its arguments simplified, while still yielding the same net predictions and managing not to be falsified by history. I trust complicated arguments much more when they have simple versions that give mostly the same answers; I would trust my arguments about growth curves less if there weren’t also the simpler version, “Smart minds build even smarter minds.” If the model told me something I hadn’t expected, but I could translate the same argument back into simpler language and the model produced similar results even when given a few cross-validational shoves, I’d probably believe it.


Based on my reading of his position, Yudkowsky sees deep knowledge as highly compressed causal explanations of “what sort of hypothesis ends up being right”. The compression means that we can rederive the successful hypotheses and theories from the causal explanation. Finally, such deep knowledge translates into partial constraints on hypothesis space, which focus the search by pointing out what cannot work. This in turn means that deep knowledge is far better at saying what won’t work than at precisely predicting the correct hypothesis.

I also want to point out something that became clearer and clearer in reading old posts: Yudkowsky is nothing if not coherent. You might not like his tone in the recent discussions, but if someone has been saying the same thing for 13 years, nobody seems to get it, and their model predicts that this will lead to the end of the world, maybe they can get some slack for talking smack.

New Comment
2 comments, sorted by Click to highlight new comments since:

For what it's worth, I often find Eliezer's arguments unpersuasive because they seem shallow. For example:

The insight is in realizing that the hypothetical planner is only one line of outer shell command away from being a Big Scary Thing and is therefore also liable to be Big and Scary in many ways.

This seem like a fuzzy "outside view" sort of argument. (Compare with: "A loaded gun is one trigger pull away from killing someone and is therefore liable to be deadly in many ways." On the other hand, a causal model of a gun lets you explain which specific gun operations can be deadly and why.)

I'm not saying Eliezer's conclusion is false. I find other arguments for that conclusion much more persuasive, e.g. involving mesa-optimizers, because there is a proposed failure type which I understand in causal/mechanistic terms.

(I can provide other examples of shallow-seeming arguments if desired.)

Great investigation/clarification of this recurring idea from the ongoing Late 2021 MIRI Conversations.

  • outside vs. inside view - I've thought about this before but hadn't read this clear a description of the differences and tradeoffs before (still catching up on Eliezer's old writings)
  • "deep knowledge is far better at saying what won’t work than at precisely predicting the correct hypothesis." - very useful takeaway

You might not like his tone in the recent discussions, but if someone has been saying the same thing for 13 years, nobody seems to get it, and their model predicts that this will lead to the end of the world, maybe they can get some slack for talking smack.

Good point and we should. Eliezer is a valuable source of ideas and experience around alignment, and it seems like he's contributed immensely to this whole enterprise.

I just hope all his smack talking doesn't turn off/away talented people coming to lend a hand on alignment. I expect a lot of people on this (AF) forum found it like me after reading all Open Phil and 80,000 Hours' convincing writing about the urgency of solving the AI alignment problem. It seems silly to have those orgs working hard to recruit people to help out, only to have them come over here and find one of the leading thinkers in the community going on frequent tirades about how much EAs suck, even though he doesn't know most of us. Not to mention folks like Paul and Richard who have been taking his heat directly in these marathon discussions!