61

This is a transcription of Eliezer Yudkowsky responding to Paul Christiano's Takeoff Speeds live on Sep. 14, followed by a conversation between Eliezer and Paul. This discussion took place after Eliezer's conversation with Richard Ngo.

Color key:

61

New Comment
Some comments are truncated due to high volume. Change truncation settings

I stand ready to bet with Eliezer on any topic related to AI, science, or technology. I'm happy for him to pick but I suggest some types of forecast below.

If Eliezer’s predictions were roughly as good as mine (in cases where we disagree), then I would update towards taking his views more seriously. Right now it looks to me like his view makes bad predictions about lots of everyday events.

It’s possible that we won’t be able to find cases where we disagree, and perhaps that Eliezer’s model totally agrees with mine until we develop AGI. But I think that’s unlikely for a few reasons:

• I constantly see observations that seem like evidence for Eliezer’s views (e.g. any time I see an ML paper with a surprisingly large effect size, or ML labs failing to make investments in scaling, or people being surprisingly unreasonable), it’s just that I see significantly more evidence against his views. The point of making bets in advance is that it can correct for my hindsight bias or for my inability to simulate “what Eliezer’s view would say about this.” Eliezer could also say that actually all of the observations I listed aren't evidence for his view, which would be interesting to me.
• Eliezer frequen

I do wish to note that we spent a fair amount of time on Discord trying to nail down what earlier points we might disagree on, before the world started to end, and these Discord logs should be going up later.

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of us have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available.  Another basic problem, as I'd see it, is that we tend to tell stories about very different subject matters - I care a lot less than Paul about the quantitative monetary amount invested into Intel, to the point of not really trying to develop expertise about that.

I claim that I came off better than Robin Hanson in our FOOM debate compared to the way that history went.  I'd claim that my early judgments of the probable importance of AGI, at all, stood up generally better than early non-Yudkowskian EA talking about that.  Other people I've noticed ever m... (read more)

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available.

I agree that it's plausible that we both make the same predictions about the near future. I think we probably don't, and there are plenty of disagreements about all kinds of stuff. But if in fact we agree, then in 5 years you shouldn't say "and see how much the world looked like I said?"

It feels to me like it goes:  you say AGI will look crazy.  Then I say that sounds unlike the world of today. Then you say "no, the world actually always looks discontinuous in the ways I'm predicting and your model is constantly surprised by real stuff that happens, e.g. see transformers or AlphaGo" and then I say "OK, let's bet about literally anything at all, you pick."

I think it's pretty likely that we actually do disagree about how much the world of today is boring and continuo... (read more)

I feel a bit confused about where you think we meta-disagree here, meta-policy-wise.  If you have a thesis about the sort of things I'm liable to disagree with you about, because you think you're more familiar with the facts on the ground, can't you write up Paul's View of the Next Five Years and then if I disagree with it better yet, but if not, you still get to be right and collect Bayes points for the Next Five Years?

I mean, it feels to me like this should be a case similar to where, for example, I think I know more about macroeconomics than your typical EA; so if I wanted to expend the time/stamina points, I could say a bunch of things I consider obvious and that contradict hot takes on Twitter and many EAs would go "whoa wait really" and then I could collect Bayes points later and have performed a public service, even if nobody showed up to disagree with me about that.  (The reason I don't actually do this... is that I tried; I keep trying to write a book about basic macro, only it's the correct version explained correctly, and have a bunch of isolated chapters and unfinished drafts.)  I'm also trying to write up my version of The Next Five Years assuming the wo... (read more)

I think you think there's a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do.

I've said I'm happy to bet about anything, and listed some particular questions I'd bet about where I expect you to be wronger. If you had issued the same challenge to me, I would have picked one of the things and we would have already made some bets. So that's why I feel like the ball is in your court to say what things you're willing to make forecasts about.

That said, I don't know if making bets is at all a good use of time. I'm inclined to do it because I feel like your view really should be making different predictions (and I feel like you are participating in good faith and in fact would end up making different predictions). And I think it's probably more promising than trying to hash out the arguments since at this point I feel like I mostly know your position and it's incredibly slow going. But it seems very plausible that the right move is just to agree to disagree and not spend time on this. In that case it was particularly bad of me to try to claim the epistemic high ground. I can't really defend... (read more)

I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.

I'd be happy to disagree about romantic chatbots or machine translation. I'd have to look into it more to get a detailed sense in either, but I can guess. I'm not sure what "wouldn't be especially surprised" means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you'd also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we're missing and we can revise in light of that, but we'd still need more resolution. (That said: what's up for grabs here are predictions about the future, not present.)

I'd guess that machine translation is currently something like $100M/year in value, and will scale up more like 2x/year than 10x/year as DL improves (e.g. most of the total log increase will be in years with <3x increase rather than >3x increase, and 3 is like the 60th percentile of the number for which that inequality is tight). I'd guess that increasing deployment of romantic chatbots will end up with technical change happening first followed by social change second, so the speed of deployment and change will depend ... (read more) 5Eliezer Yudkowsky7dThanks for continuing to try on this! Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get$1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial$10B/yr. I also remark that I wouldn't be surprised to hear that waifutech is already past $1B/yr in China, but I haven't looked into things there. I don't expect the waifutech to transcend my own standards for mediocrity, but something has to be pretty good before I call it more than mediocre; do you think there's particular things that waifutech won't be able to do? My model permits large jumps in ML translation adoption; it is much less clear about whether anyone will be able to build a market moat and charge big prices for it. Do you have a similar intuition about # of users increasing gradually, not just revenue increasing gradually? I think we're still at the level of just drawing images about the future, so that anybody who came back in 5 years could try to figure out who sounded right, at all, rather than assembling a decent portfolio of bets; but I also think that just having images versus no images is a lot of progress. 3Paul Christiano7dYes, I think that value added by automated translation will follow a similar pattern. Number of words translated is more sensitive to how you count and random nonsense, as is number of "users" which has even more definitional issues. You can state a prediction about self-driving cars in any way you want. The obvious thing is to talk about programs similar to the existing self-driving taxi pilots (e.g. Waymo One) and ask when they do$X of revenue per year, or when $X of self-driving trucking is done per year. (I don't know what AI freedom-of-the-road means, do you mean something significantly more ambitious than self-driving trucks or taxis?) 4Paul Christiano7dMan, the problem is that you say the "jump to newly accessible domains" will be the thing that lets you take over the world. So what's up for dispute is the prototype being enough to take over the world rather than years of progress by a giant lab on top of the prototype. It doesn't help if you say "I expect new things to sometimes become possible" if you don't further say something about the impact of the very early versions of the product. If e.g. people were spending$1B/year developing a technology, and then after a while it jumps from 0/year to $1B/year of profit, I'm not that surprised. (Note that machine translation is radically smaller than this, I don't know the numbers.) I do suspect they could have rolled out a crappy version earlier, perhaps by significantly changing their project. But why would they necessarily bother doing that? For me this isn't violating any of the principles that make your stories sound so crazy. The crazy part is someone spending$1B and then generating $100B/year in revenue (much less$100M and then taking over the world). (Note: it is surprising if an industry is spending $10T/year on R&D and then jumps from$1T --> $10T of revenue in one year in a world that isn't yet growing crazily. The surprising depends a lot on the numbers involved, and in particular on how valuable it would have been to deploy a worse version earlier and how hard it is to raise money at different scales.) The crazy part is someone spending$1B and then generating $100B/year in revenue (much less$100M and then taking over the world).

Would you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids?

4Paul Christiano7dIt's not a description of hominids at all, no one spent any money on R&D. I think there are analogies where this would be analogous to hominids (which I think are silly, as we discuss in the next part of this transcript). And there are analogies where this is a bad description of hominids (which I prefer).

Spending money on R&D is essentially the expenditure of resources in order to explore and optimize over a promising design space, right? That seems like a good description of what natural selection did in the case of hominids. I imagine this still sounds silly to you, but I'm not sure why. My guess is that you think natural selection isn't relevantly similar because it didn't deliberately plan to allocate resources as part of a long bet that it would pay off big.

3Paul Christiano7dI think natural selection has lots of similarities to R&D, but (i) there are lots of ways of drawing the analogy, (ii) some important features of R&D are missing in evolution, including some really important ones for fast takeoff arguments (like the existence of actors who think ahead). If someones wants to spell out why they think evolution of hominids means takeoff is fast then I'm usually happy to explain why I disagree with their particular analogy. I think this happens in the next discord log between me and Eliezer.

My uncharitable read on many of these domains is that you are saying "Sure, I think that Paul might have somewhat better forecasts than me on those questions, but why is that relevant to AGI?"

In that case it seems like the situation is pretty asymmetrical. I'm claiming that my view of AGI is related to beliefs and models that also bear on near-term questions, and I expect to make better forecasts than you in those domains because I have more accurate beliefs/models. If your view of AGI is unrelated to any near-term questions where we disagree, then that seems like an important asymmetry.

Inevitably, you can go back afterwards and claim it wasn't really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then

It seems like you are saying that there is some measure that was continuous all along, but that it's not obvious in advance which measure was continuous. That seems to suggest that there are a bunch of plausible measures you could suggest in advance, and lots of interesting action will be from changes that are discontinuous changes on some of those measures. Is that right?

If so, don't we get out a ton of predictions? Like, for every particular line someone thinks might be smooth, the gradualist has a higher probability on it being smooth than you would? So why can't I just start naming some smooth lines (like any of the things I listed in the grandparent) and then we can play ball?

If not, what's your position? Is it that you literally can't think of the possible abstractions that would later make the graph smooth? (This sounds insane to me.)

sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2

I find this kind of bluster pretty frustrating and condescending. I also feel like the implication is just wrong---if Eliezer and I disagree, I'd guess it's because he's worse at predicting ML progress. To me GPT-3 feels much (much) closer to my mainline than to Eliezer's, and AlphaGo is very unsurprising. But it's hard to say who was actually "caught flatfooted" unless we are willing to state some of these predictions in advance.

I got pulled into this interaction because I wanted to get Eliezer to make some real predictions, on the record, so that we could have a better version of this discussion in 5 years rather than continuing to both say "yeah, in hindsight this looks like evidence for my view." I apologize if my tone (both in that discussion and in this comment) is a bit frustrated.

It currently feels from the inside like I'm holding the epistemic high ground on this point, though I expect Eliezer disagrees strongly:

• I'm willing to bet on anything Eliezer wants, or to propose my own questions if Eliezer is willing in principle to make forecasts. I expect

I wish to acknowledge this frustration, and state generally that I think Paul Christiano occupies a distinct and more clueful class than a lot of, like, early EAs who mm-hmmmed along with Robin Hanson on AI - I wouldn't put, eg, Dario Amodei in that class either, though we disagree about other things.

But again, Paul, it's not enough to say that you weren't surprised by GPT-2/3 in retrospect, it kinda is important to say it in advance, ideally where other people can see?  Dario picks up some credit for GPT-2/3 because he clearly called it in advance.  You don't need to find exact disagreements with me to start going on the record as a forecaster, if you think the course of the future is generally narrower than my own guesses - if you think that trends stay on course, where I shrug and say that they might stay on course or break.  (Except that of course in hindsight somebody will always be able to draw a straight-line graph, once they know which graph to draw, so my statement "it might stay on trend or maybe break" applies only to graphs extrapolating into what is currently the future.)

Suppose your view is "crazy stuff happens all the time" and my view is "crazy stuff happens rarely." (Of course "crazy" is my word, to you it's just normal stuff.) Then what am I supposed to do, in your game?

More broadly: if you aren't making bold predictions about the future, why do you think that other people will? (My predictions all feel boring to me.) And if you do have bold predictions, can we talk about some of them instead?

It seems to me like I want you to say "well I think 20% chance something crazy happens here" and I say "nah, that's more like 5%" and then we batch up 5 of those and when none of them happen I get a bayes point.

I could just give my forecast. But then if I observe that 2/20 of them happen, how exactly does that help me in figuring out whether I should be paying more attention to your views (or help you snap out of it)?

I can list some particular past bets and future forecasts, but it's really unclear what to do with them without quantitative numbers or a point of comparison.

Like you I've predicted that AI is undervalued and will grow in importance, although I think I made a much more specific prediction that investment in AI would go up a lot in the short t... (read more)

I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.

Well, it sounds like I might be more bullish than you on theorem-proving, possibly.  Not on it being useful or profitable, but in terms of underlying technology making progress on non-profitable amazing demo feats, maybe I'm more bullish on theorem-proving than you are?  Is there anything you think it shouldn't be able to do in the next 5 years?

I'm going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper. Big unknowns are then (i) how many orders of magnitude of "low-hanging fruit" are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is "they are pretty easy" / "humans are pretty bad at these tasks," but it's somewhat harder to quantify. If you think your methodology is different from that then we will probably end up disagreeing.

Looking towards more ambitious benchmarks, I think that the IMO grand challenge is currently significantly more than 5 years away. In 5 year's time my median guess (without almost any thinking about it) is that automated solvers can do 10% of non-geometry, non-3-variable-inequality IMO shortlist problems.

So yeah, I'm happy to play ball in this area, and I expect my predictions to be somewhat more right than yours after the dust settles. Is there some way of measuring such that you are willing to state any prediction?

(I still feel like I'm basically looking for any predictions at all beyond sometimes saying "my model ... (read more)

I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read.  I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024 - though of course, as events like this lie in the Future, they are very hard to predict.

Can you say more about why or whether you would, in this case, say that this was an un-Paulian set of events?  As I have trouble manipulating my Paul model, it does not exclude Paul saying, "Ah, yes, well, they were using 700M models in that paper, so if you jump to 70B, of course the IMO grand challenge could fall; there wasn't a lot of money there."  Though I haven't even glanced at any metrics here, let alone metrics that the IMO grand challenge could be plotted on, so if smooth metrics rule out IMO in 5yrs, I am more interested yet - it legit decrements my belief, but not nearly as much as I imagine it would decrement yours.

(Edit:  Also, on the meta-level, is this, like, anywhere at all near the sort of thing you were hoping to hear from me?  Am I now being a better epistemic citizen, if maybe not a good one by your lights?)

Yes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I'd look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I'm also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don't expect it to get you that far.

I don't think this is an easy prediction to extract from a trendline, in significant part because you can't extrapolate trendlines this early that far out. So this is stress-testing different parts of my model, which is fine by me.

At the meta-level, this is the kind of thing I'm looking for, though I'd prefer have some kind of quantitative measure of how not-surprised you are. If you are only saying 2% then we probably want to talk about things less far in your tails than the IMO challenge.

Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025.  The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.

So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse?  Pretty bare portfolio but it's at least a start in both directions.  If we say 5% instead of 1%, how much further would you extend the time limit out beyond 2024?

I also don't know at all what part of your model forbids theorem-proving to fall in a shocking headline followed by another headline a year later - it doesn't sound like it's from looking at a graph - and I think that explaining reasons behind our predictions in advance, not just making quantitative predictions in advance, will help others a lot here.

EDIT: Though the formal IMO challenge has a barnacle about the AI being open-sourced, which is a separate sociological prediction I'm not taking on.

I think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails.

I'd say <4% on end of 2025.

I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are so surprising.

If I'm at 4% and you are 12% and we had 8 such bets, then I can get a factor of 2 if they all come out my way, and you get a factor of ~1.5 if one of them comes out your way.

I might think more about this and get a more coherent probability distribution, but unless I say something else by end of 2021 you can consider 4% on end of 2025 this my prediction.

Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend?  Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general?  Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?

Added:  This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics would not consider you one of their own) where they saunter around saying "X will not occur in the next 5 / 10 / 20 years" and they're often right for the next couple of years, because there's only one year where X shows up for any particular definition of that, and most years are not that year; but also they're saying exactly the same thing up until 2 years before X shows up, if there's any early warning on X at all.  It seems to me that 2 years is about as far as Nope Vision extends in real life, for any case that isn't completely slam-dunk; when I called upon those gathered AI luminaries to say the least impressive thing that... (read more)

I think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M ->$1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict

There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orders of magnitude, and there is likely lots of low-hanging fruit to pick, and we just don't have much to extrapolate from (compared to more mature technologies, or how I expect AI will be shortly before the end of days), and for similar reasons there aren't really any benchmarks to extrapolate.

(Also note that it matters a lot whether you know what problems labs will try to take a stab at. For the purpose of all of these forecasts, I am trying insofar as possible to set aside all knowledge about what labs are planning to do though that's obviously not incentive-compatible and there's no particular reason you should trust me to do that.)

I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024

Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.

I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.

Ha!  Okay then.  My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available; I frame no separate sociological prediction about whether somebody is willing to open-source the AI model that does it.

I don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting.

Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and modest effort). Together with modest progress in deep learning based methods, and a somewhat serious effort, I wouldn't be surprised by people getting up to 20-40% of problems. The bronze cutoff is usually 3/6 problems, and the gold cutoff is usually 5/6 (assuming the AI doesn't get partial credit). The difficulty of problems also increases very rapidly for humans---there are often 3 problems that a human can do more-or-less mechanically.

I could tighten any of these estimates by looking at the distribution more carefully rather than going off of my recollections from 2008, and if this was going to be one of a handful of things we'd bet about I'd probably spend a few hours doing that and some other basic digging.

I looked at a few recent IMOs to get better calibrated. I think the main update is that I underestimated how many years you can get a gold with only 4/6 problems. There are also a lot of years where I think a machine might be able to get 4/6. That said, they do have to get lucky with both IMO content and someone has to make a serious and mostly-successful effort, but I'm at least a bit scared by that.

Might be good to make some side bets:

• Conditioned on winning I think it's only maybe 20% probability to get all 6 problems (whereas I think you might have a higher probability on jumping right past human level, or at least have 50% on 6 vs 5?).
• Conditioned on a model getting 3+ problems I feel like we have a pretty good guess about what algorithm will be SOTA on this problem (e.g. I'd give 50% to a pretty narrow class of algorithms with some uncertain bells and whistles, with no inside knowledge). Whereas I'd guess you have a much broader distribution.

But more useful to get other categories of bets. (Maybe in programming, investment in AI, economic impact from robotics, economic impact from chatbots, translation?)

4Matthew Barnett7dIt feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equivalent of a Gold medal). The relevant question is which rank/percentile you think is likely to be attained by 2025 under your model but you predict would be implausible under Paul's model. This may be a daunting task, but one way to get started is to put a probability distribution over what you think the state-of-the-art will look like by 2025, and then compare to Paul's. Edit: Here are, for example, the individual rankings for 2021: https://www.imo-official.org/year_individual_r.aspx?year=2021 [https://www.imo-official.org/year_individual_r.aspx?year=2021]

I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%.  Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict.  Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%.  That's a separate matter.  I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.

3Rob Bensinger7dMy model of Eliezer may be wrong, but I'd guess that this isn't a domain where he has many over-50% predictions of novel events at all? See also 'I don't necessarily expect self-driving cars before the apocalypse'. My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like 'maybe we'll make insane progress on theorem-proving (or whatever) out of the blue'. Again, I may be wrong, but my intuition is that you're Paul-omorphizing Eliezer when you assume that >16% probability of huge progress in X by year Y implies >50% probability of smaller-but-meaningful progress in X by year Y.
2Matthew Barnett7dIf this task is bad for operationalization reasons, there are other theorem proving benchmarks [https://paperswithcode.com/task/automated-theorem-proving]. Unfortunately it looks like there aren't a lot of people that are currently trying to improve on the known benchmarks, as far as I'm aware. The code generation benchmarks [https://paperswithcode.com/task/code-generation] are slightly more active. I'm personally partial to Hendrycks et al.'s APPS benchmark [https://arxiv.org/pdf/2105.09938v3.pdf], which includes problems that "range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability." (Github link [https://github.com/hendrycks/apps]).
4Paul Christiano9hI think Metaculus is closer to Eliezer here: conditioned on this problem being resolved it seems unlikely for the AI to be either open-sourced or easily reproducible.
1Matthew Barnett9hMy honest guess is that most predictors didn’t see that condition and the distribution would shift right if someone pointed that out in the comments.

To me GPT-3 feels much (much) closer to my mainline than to Eliezer's

To add to this sentiment, I'll post the graph from my notebook on language model progress. I refer to the Penn Treebank task a lot when making this point because it seems to have a lot of good data, but you can also look at the other tasks and see basically the same thing.

The last dip in the chart is from GPT-3. It looks like GPT-3 was indeed a discontinuity in progress but not a very shocking one. It roughly would have taken about one or two more years at ordinary progress to get to that point anyway -- which I just don't see as being all that impressive.

I sorta feel like the main reason why lots of people found GPT-3 so impressive was because OpenAI was just good at marketing the results [ETA: sorry, I take back the use of the word "marketing"]. Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to really worked well for them.

I admit this is an uncharitable explanation, but is there a better story to tell about why GPT-3 captured so much attention?

The impact of GPT-3 had nothing whatsoever to do with its perplexity on Penn Treebank. I think this is a good example of why focusing on perplexity and 'straight lines on graph go brr' is so terrible, such cargo cult mystical thinking, and crippling. There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it! They didn't even tweet the paper! (Forget an OA blog post, accompanying NYT/TR articles, tweets by everyone at OA, a fancy interactive interface - none of that.) And most of the initial reaction was "GPT-3: A Disappointing Paper"-style. If this is marketing genius, then it is truly 40-d chess, is all I can say.

The impact of GPT-3 was in establishing that trendlines did continue in a way that shocked pretty much everyone who'd written off 'naive' scaling strategies. Progress is made out of stacked sigmoids: if the next sigmoid doesn't show up, progress doesn't happen. Trends happen, until they stop. Trendlines are not caused by the laws of physics. You can dismiss AlphaGo by saying "oh, that just continues the trendline in... (read more)

And to say it also explicitly, I think this is part of why I have trouble betting with Paul.  I have a lot of ? marks on the questions that the Gwern voice is asking above, regarding them as potentially important breaks from trend that just get dumped into my generalized inbox one day.  If a gradualist thinks that there ought to be a smooth graph of perplexity with respect to computing power spent, in the future, that's something I don't care very much about except insofar as it relates in any known way whatsoever to questions like those the Gwern voice is asking.  What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?  Isn't this sort of a shell game where our surface capabilities do weird jumpy things, we can point to some trend lines that were nonetheless smooth, and then the shells are swapped and we're told to expect gradualist AGI surface stuff?  This is part of the idea that I'm referring to when I say that, even as the world ends, maybe there'll be a bunch of smooth trendlines underneath it that somebody could look back and point out.  (Which you could in fact have used to predict all the key jumpy surface thresholds, if you'd watched it all happen on a few other planets and had any idea of where jumpy surface events were located on the smooth trendlines - but we haven't watched it happen on other planets so the trends don't tell us much we want to know.)

This seems totally bogus to me.

It feels to me like you mostly don't have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I'm saying I'm totally happy to use gradualist metrics to predict any of those things. If you want to say "what does it mean to be a gradualist" I can just give you predictions on them. To you this seems reasonable, because e.g.$ and benchmarks are not the right way to measure the kinds of impacts we care about. That's fine, you can propose something other than $or measurable benchmarks. If you can't propose anything, I'm skeptical. My basic guess is that you probably can't effectively predict$ or benchmarks or anything else quantitative. If you actually agreed with me on all that stuff, then I might suspect that you are equivocating between a gradualist-like view that you use for making predictions about everything near term and then switching to a more bizarre perspective when talking about the future. But fortunately I think this is more straightforward, because you are basically being honest when you say that you don't understand how the gradualist perspective makes predictions.

I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."  We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there's a very real sense in which, yes, what we're interested in are milestones, and often milestones that aren't easy to define even after the fact.  GPT-2 was shocking, and then GPT-3 carried that shock further in that direction, but how do you talk with that about somebody who thinks that perplexity loss is smooth?  I can handwave statements like "GPT-3 started to be useful without retraining via just prompt engineering" but qualitative statements like those aren't good for betting, and it's much much harder to come up with the right milestone like that in advance, instead of looking back in your rearview mirror afterwards.

But you say - I think? - that yo... (read more)

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."

Don't you think you're making a falsifiable prediction here?

Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.

What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?

Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships.

I would generally be surprised if there were many highly non-linear relationship between perplexity and something like Winograd accuracy, human evaluation, or whatever other concrete measure you can come up with, such that the underlying behavior of the surface phenomenon is best described as a discontinuity with the past even when the latent perplexity changed smoothly. I admit the existence of some measures that exhibit these qualities (such as, potentially, the ability to do arithmetic), but I expect them to be quite a bit harder to find than the reverse.

Furthermore, it seems like if this is the crux — ie. that surface-level qualitative phenomena will experience discontinuities even while ... (read more)

Well put / endorsed / +1.

I think that most people who work on models like GPT-3 seem more interested in trendlines than you do here.

That said, it's not super clear to me what you are saying so I'm not sure I disagree. Your narrative sounds like a strawman since people usually extrapolate performance on downstream tasks they care about rather than on perplexity. But I do agree that the updates from GPT-3 are not from OpenAI's marketing but instead from people's legitimate surprise about how smart big language models seem to be.

As you say, I think the interesting claim in GPT-3 was basically that scaling trends would continue, where pessimists incorrectly expected they would break based on weak arguments. I think that looking at all the graphs, both of perplexity and performance on individual tasks, helps establish this as the story. I don't really think this lines up with Eliezer's picture of AGI but that's presumably up for debate.

There are always a lot of people willing to confidently decree that trendlines will break down without much argument. (I do think that eventually the GPT-3 trendline will break if you don't change the data, but for the boring reason that the entropy of natural language will eventually dominate the gradient noise and so lead to a predictable slowdown.)

There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it!

Yeah, my phrasing there was not ideal here. I regret using the word "marketing", but to be fair, I mostly meant what I said in the next few sentences, "Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to really worked well for them."

Of course, seeing that such an opportunity exists is itself laudable and I give them Bayes points for realizing that scaling laws are important. At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?

I do agree that it's extremely interesting to know why the lines go straight. I feel like I wasn't trying to say that GPT-3 wasn't intrinsically interesting. I was more saying it wasn't unpredictable, in the sense that Paul Christiano would have strongly said "no I do not expect that to happen" in 2018.

Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit? That is, before they knew about how interesting the consequences would be of that small change in BPC?

At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?

Who's 'we', exactly? Who are these people who expected all of this to happen, and are going around saying "ah yes, these BIG-Bench results are exactly as I calculated back in 2018, the capabilities are all emerging like clockwork, each at their assigned BPC; next is capability Z, obviously"? And what are they saying about 500b, 1000b, and so on?

I was more saying it wasn't unpredictable, in the sense that Paul Christiano would have strongly said "no I do not expect that to happen" in 2018.

OK. So can you link me to someone saying in 2018 that we'd see GPT-2-1.5b's behavior at ~1.5b parameters, and that we'd get few-shot metalearning and instructability past that with another OOM? And while you're at it, if it's so predictable, please answer all the other... (read more)

I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance. Qualitative capabilities are more relevant to safety than benchmark performance is, because while qualitative capabilities include things like "code a simple video game" and "summarize movies with emojis", they also include things like "break out of confinement and kill everyone". It's the latter capability, and not PTB performance, that you'd need to predict if you wanted to reliably stay out of the x-risk regime — and the fact that we can't currently do so is, I imagine, what brought to mind the analogy between scaling and Russian roulette.

I.e., a straight line in domain X is indeed not surprising; what's surprising is the way in which that straight line maps to the things we care about more than X.

(Usual caveats apply here that I may be misinterpreting folks, but that is my best read of the argument.)

I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance.

This is a reasonable thesis, and if indeed it's the one Gwern intended, then I apologize for missing it!

That said, I have a few objections,

• Isn't it a bit suspicious that the thing-that's-discontinuous is hard to measure, but the-thing-that's-continuous isn't? I mean, this isn't totally suspicious, because subjective experiences are often hard to pin down and explain using numbers and statistics. I can understand that, but the suspicion is still there.
• "No one predicted X in advance" is only damning to a theory if people who believed that theory were making predictions about it at all. If people who generally align with Paul Christiano were indeed making predictions to the effect of GPT-3 capabilities being impossible or very unlikely within a narrow future time window, then I agree that would be damning to Paul's worldview. But -- and maybe I missed something -- I didn't see that. Did you?
• There seems to be an implicit claim that Pa

it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives.

My impression is that some people are impressed by GPT-3's capabilities, whereas your response is "ok, but it's part of the straight-line trend on Penn Treebank; maybe it's a little ahead of schedule, but nothing to write home about." But clearly you and they are focused on different metrics!

That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you have to throw on additional qualifiers that I'm not going to look up; pretend I did that.) So on a graph of "language model of type X revenue over time",  total revenue is static at 0 for a long time and then shortly after GPT-3's creation departs from 0.

It seems like the fact that GPT-3 could be commercialized in this way when GPT-2 couldn't is a result of something that Penn Treebank perplexity is sort of pointing at. (That is, it'd be hard to get a model with GPT-3's commercializability but GPT-2's Penn Treebank score.) But what we need in order for the straight line on PTB to be useful as a model for predicting revenue i... (read more)

4Matthew Barnett7dI think it's the nature of every product that comes on the market that it will experience a discontinuity from having zero revenue to having some revenue at some point. It's an interesting question of when that will happen, and maybe your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend. However, I suspect that the revenue curve will look pretty continuous, now that it's gone from zero to one. Do you disagree? In a world with continuous, gradual progress across a ton of metrics, you're going to get discontinuities from zero to one. I don't think anyone from the Paul camp disagrees with that (in fact, Katja Grace talked about this [https://aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi/#Starting_high] in her article). From the continuous takeoff perspective, these discontinuities don't seem very relevant unless going from zero to one is very important in a qualitative sense. But I would contend that going from "no revenue" to "some revenue" is not actually that meaningful in the sense of distinguishing AI from the large class of other economic products that have gradual development curves.

your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend.

This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you'll understand what will happen with economics/security/etc., when in fact you don't have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don't have much understanding of this either; I wish I did!]

For example, I imagine that, by thirty years from now, we'll have language/code models that can do significant security analysis of the code that was available in 2020, and that this would have been highly relevant/valuable to people in 2020 interested in computer security. But when will this happen in the 2020-2050 range that seems likely to me? I'm pretty uncertain, and I expect this to look a lot like 'flicking a switch' in retrospect, even tho the leadup to flicking that switch will probably look like smoothly increasing capabilities on 'toy' problems.

[My current guess is that Paul / people in "Paul's camp" would mostly agree with the prev... (read more)

4Edouard Harris7dYeah, these are interesting points. I sympathize with this view, and I agree there is some element of truth to it that may point to a fundamental gap in our understanding (or at least in mine). But I'm not sure I entirely agree that discontinuous capabilities are necessarily hard to measure: for example, there are benchmarks [https://github.com/openai/grade-school-math] available for things like arithmetic, which one can train on and make quantitative statements about. I think the key to the discontinuity question is rather that 1) it's the jumps in model scaling that are happening in discrete increments; and 2) everything is S-curves, and a discontinuity always has a linear regime if you zoom in enough. Those two things together mean that, while a capability like arithmetic might have a continuous performance regime on some domain, in reality you can find yourself halfway up the performance curve in a single scaling jump (and this is in fact what happened with arithmetic and GPT-3). So the risk, as I understand it, is that you end up surprisingly far up the scale of "world-ending" capability from one generation to the next, with no detectable warning shot beforehand. No, you're right as far as I know; at least I'm not aware of any such attempted predictions. And in fact, the very absence of such prediction attempts is interesting in itself. One would imagine that correctly predicting the capabilities of an AI from its scale ought to be a phenomenally valuable skill — not just from a safety standpoint, but from an economic one too. So why, indeed, didn't we see people make such predictions, or at least try to? There could be several reasons. For example, perhaps Paul (and other folks who subscribe to the "continuum" world-model) could have done it, but they were unaware of the enormous value of their predictive abilities. That seems implausible, so let's assume they knew the value of such predictions would be huge. But if you know the value of doing something i

So... I totally think there are people who sort of nod along with Paul, using it as an excuse to believe in a rosier world where things are more comprehensible and they can imagine themselves doing useful things without having a plan for solving the actual hard problems. Those types of people exist. I think there's some important work to be done in confronting them with the hard problem at hand.

But, also... Paul's world AFAICT isn't actually rosier. It's potentially more frightening to me. In Smooth Takeoff world, you can't carefully plan your pivotal act with an assumption that the strategic landscape will remain roughly the same by the time you're able to execute on it. Surprising partial-gameboard-changing things could happen that affect what sort of actions are tractable. Also, dumb, boring ML systems run amok could kill everyone before we even get to the part where recursive self improving consequentialists eradicate everyone.

I think there is still something seductive about this world – dumb, boring ML systems run amok feels like the sort of problem that is easier to reason about and maybe solve. (I don't think it's actually necessarily easier to solve, but I think it ca... (read more)

My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I'd guess we'll see at least 2 such examples in 2022 and at least 100 by 2030.

At the point where there's a "FOOM", such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not certain--maybe 33%?) that once you have the right phase transition to kick off the rest, everything else happens pretty quickly (within a few days).

Is this take more consistent with Paul's or Eliezer's? I'm not totally sure. I'd guess closer to Paul's, but maybe the "1 day" world is consistent with Eliezer's?

(One candidate for the "big" phase transition would be if the model figures out how to go off and learn on its own, so that number of SGD updates is no longer the primary bottleneck on model capabilities. But I could also imagine us getting that even when models are still fairly "dumb".)

your view seems to imply that we will move quickly from much worse than humans to much better than humans, but it's likely that we will move slowly through the human range on many tasks

We might be able to falsify that in a few months.

There is a joint Google / OpenAI project called BIG-bench. They've crowdsourced ~200 of highly diverse text tasks (from answering scientific questions to predicting protein interacting sites to measuring self-awareness).

One of the goals of the project is to see how the performance on the tasks is changing with the model size, with the size ranging by many orders of magnitude.

A half-year ago, they presented some preliminary results. A quick summary:

if you increase the N of parameters from 10^7 to 10^10, the aggregate performance score grows roughly like log(N).

But after the 10^10 point, something interesting happens: the score starts growing much faster (~N).

And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human).

The paper with the full results is expected to be published in the next few months.

Judging by the preliminary results, the FOOM could start like this:

The GPT-5 still sucks on most tasks. It's mostly useless. But what if we increase parameters_num by 2? What could possibly go wrong?

Hot damn, where can I see these preliminary results?

The results were presented at a workshop by the project organizers. The video from the workshop is available here (the most relevant presentation starts at 5:05:00).

It's one of those innocent presentations that, after you understand the implications, keep you awake at night.

Presumably you're referring to this graph. The y-axis looks like the kind of score that ranges between 0 and 1, in which case this looks sort-of like a sigmoid to me, which accelerates when it gets closer to ~50% performance (and decelarates when it gets closer to 100% performance).

If so, we might want to ask whether these tasks are chosen ~randomly (among tasks that are indicative of how useful AI is) or if they're selected for difficulty in some way. In particular, assume that most tasks look sort-of like a sigmoid as they're scaled up (accelerating around 50%, improving slower when they're closer to 0% and 100%). Then you might think that the most exciting tasks to submit to big bench would be the tasks that can't be handled by small models, but that large models rapidly improve upon (as opposed to tasks that are basically-solved already by 10^10 parameters). In which case the aggregation of all these tasks could be expected to look sort-of like this, improving faster after 10^10 than before.

...is one story I can tell, but idk if I would have predicted that beforehand, and fast acceleration after 10^10 is certainly consistent with many people's qualitative impressions of GPT-3. So maybe there is some real acceleration going on.

(Also, see this post for similar curves, but for the benchmarks that OpenAI tested GPT-3 on. There's no real acceleration visible there, other than for arithmetic.)

But after the 10^10 point, something interesting happens: the score starts growing much faster (~N).

And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human).

Seems interestingly similar to the grokking phenomenon.

Survey on model updates from reading this post. Figuring out to what extent this post has led people to update may inform whether future discussions are valuable.

Results: (just posting them here, doesn't really need its own post)

The question was to rate agreement on the 1=Paul to 9=Eliezer axis before and after reading this post.

Data points: 35

Mean:

Median:

Raw Data

Agreement more on need for actions than on probabilities. Would be better to first present points of agreement (that it is at least possible for non(dangerously)-general AI to change situation).

the post was incredibly confusing to me and so I haven't really updated at all because I don't feel like I can crisply articulate yudkowsky's model or his differences with christiano

4Daniel Kokotajlo7dWow, I did not expect those results!

I wonder what effect there is from selecting for reading the third post in a sequence of MIRI conversations from start to end and also looking at the comments and clicking links in them.

3Edouard Harris8d(Not being too specific to avoid spoilers) Quick note: I think the direction of the shift in your conclusion might be backwards, given the statistics you've posted and that 1=Eliezer and 9=Paul.
4Lukas Finnveden8dNo, the form says that 1=Paul. It's just the first sentence under the spoiler that's wrong.
2Edouard Harris8dGood catch! I didn't check the form. Yes you are right, the spoiler should say (1=Paul, 9=Eliezer) but the conclusion is the right way round.
4Rafael Harth8dYeah, it's fixed now. Thanks for pointing it out.
1Ben Pace8dHow interesting; I am the median.

[ETA: In light of pushback from Rob: I really don't want this to become a self-fulfilling prophecy. My hope in making this post was to make the prediction less likely to come true, not more! I'm glad that MIRI & Eliezer are publicly engaging with the rest of the community more again, I want that to continue, and I want to do my part to help everybody to understand each other.]

And I know, before anyone bothers to say, that all of this reply is not written in the calm way that is right and proper for such arguments. I am tired. I have lost a lot of hope. There are not obvious things I can do, let alone arguments I can make, which I expect to be actually useful in the sense that the world will not end once I do them. I don't have the energy left for calm arguments. What's left is despair that can be given voice.

I grimly predict that the effect of this dialogue on the community will be polarization: People who didn't like Yudkowsky and/or his views will like him / his views less, and the gap between them and Yud-fans will grow (more than it shrinks due to the effect of increased dialogue). I say this because IMO Yudkowsky comes across as angry and uncharitable in various parts of ... (read more)

I grimly predict that the effect of this dialogue on the community will be polarization

Beware of self-fulfilling prophecies (and other premature meta)! If both sides in a dispute expect the other side to just entrench, then they're less likely to invest the effort to try to bridge the gap.

This very comment section is one of the main things that will determine the community's reaction, and diverting our focus to 'what will our reaction be?' before we've talked about the object-level claims can prematurely lock in a certain reaction.

(That said, I think you're doing a useful anti-polarization thing here, by showing empathy for people you disagree with, and showing willingness to criticize people you agree with. I don't at all dislike this comment overall; I just want to caution against giving up on something before we've really tried. This is the first proper MIRI-response to Paul's takeoff post, and should be a pretty big update for a lot of people -- I don't think people were even universally aware that Eliezer endorses hard takeoff anymore, much less aware of his reasoning.)

Fair enough! I too dislike premature meta, and feel bad that I engaged in it. However... I do still feel like my comment probably did more to prevent polarization than cause it? That's my independent impression at any rate. (For the reasons you mention).

I certainly don't want to give up! In light of your pushback I'll edit to add something at the top.

4Adam Shimi9dStrongly agree with that. Since you agree with Yudkowksy, do you think you could strongman his position?

Yes, though I'm much more comfortable explaining and arguing for my own position than EY's. It's just that my position turns out to be pretty similar. (Partly this is independent convergence, but of course partly this is causal influence since I've read a lot of his stuff.)

There's a lot to talk about, I'm not sure where to begin, and also a proper response would be a whole research project in itself. Fortunately I've already written a bunch of it; see these two sequences.

Here are some quick high-level thoughts:

1. Begin with timelines. The best way to forecast timelines IMO is Ajeya's model; it should be the starting point and everything else should be adjustments from it. The core part of Ajeya's model is a probability distribution over how many OOMs of compute we'd need with today's ideas to get to TAI / AGI / APS-AI / AI-PONR / etc. [Unfamiliar with these acronyms? See Robbo's helpful comment below] For reasons which I've explained in my sequence (and summarized in a gdoc) my distribution has significantly more mass on the 0-6 OOM range than Paul does, and less on the 13+ range. The single post that conveys this intuition most is Fun with +12 OOMs.

Now consider how takeoff speed v... (read more)

I feel like the debate between EY and Paul (and the broader debate about fast vs. slow takeoff) has been frustratingly much reference class tennis and frustratingly little gears-level modelling.

So, there's this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples' brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream gears first, then people won't immediately see how they're relevant to alignment or timelines or whatever, and then lots of people just wander off. Then you go try to explain something about alignment or timelines or whatever, using an argument which relies on those upstream gears, and it goes right over a bunch of peoples' heads because they don't have that upstream gear in their world-models.

For the sort of argument in this post, it's even worse, because a lot of people aren't even explicitly aware that the relevant type of gear is a thing, or how to think about it beyond a rough intuitive level.

I first ran into this problem in t... (read more)

The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets

No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere". Rounding off the slow takeoff hypothesis to "lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing$10 million in at a time" seems like black-and-white thinking, demanding that the future either be perfectly Thielien or perfectly anti-Thielien. The real question is a quantitative one — how lumpy will takeoff be?

it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.

This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.

Unfortunately, it looks like Yudkowsky and Christiano weren't able to come to an agreement on what bets to make.

In place of that, I'll ask, whatever camp you belong to: what concrete predictions do you make that you believe most strongly diverge from what people in the "other" camp believe, and can be resolved substantially before the world ends?

I propose we restrict our predictions to roughly 2026, which is pretty soon but probably not world-ending-soon (on almost all views).

Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior.

Nitpick: I think that particular analogy isn't great.

For nuclear stuff, we have two state variables: amount of fissile material and current number of neutrons flying around. The amount of fissile material determines the "neutron multiplication factor", but it is the number of neutrons that goes crazy, not fissile material. And the current number of neurons doesn't matter f... (read more)

"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.

[This comment is no longer endorsed by its author]Reply

I read "Takeoff Speeds" at the time.  I did not liveblog my reaction to it at the time.  I've read the first two other items.

I flag your weirdly uncharitable inference.

I apologize, I shouldn't have leapt to that conclusion.

Apology accepted.

FWIW, I did not find this weirdly uncharitable, only mildly uncharitable. I have extremely wide error bars on what you have and have not read, and "Eliezer has not read any of the things on that list" was within those error bars. It is really quite difficult to guess your epistemic state w.r.t. specific work when you haven't been writing about it for a while.

(Though I guess you might have been writing about it on Twitter? I have no idea, I generally do not use Twitter myself, so I might have just completely missed anything there.)

The "weirdly uncharitable" part is saying that it "seemed like" I hadn't read it vs. asking.  Uncertainty is one thing, leaping to the wrong guess another.

Yeah, even I wasn't sure you'd read those three things, Eliezer, though I knew you'd at least glanced over 'Takeoff Speeds' and 'Biological Anchors' enough to form opinions when they came out. :)