Eliezer Yudkowsky

Wiki Contributions


Yudkowsky and Christiano discuss "Takeoff Speeds"

Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend?  Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general?  Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?

Added:  This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics would not consider you one of their own) where they saunter around saying "X will not occur in the next 5 / 10 / 20 years" and they're often right for the next couple of years, because there's only one year where X shows up for any particular definition of that, and most years are not that year; but also they're saying exactly the same thing up until 2 years before X shows up, if there's any early warning on X at all.  It seems to me that 2 years is about as far as Nope Vision extends in real life, for any case that isn't completely slam-dunk; when I called upon those gathered AI luminaries to say the least impressive thing that definitely couldn't be done in 2 years, and they all fell silent, and then a single one of them named Winograd schemas, they were right that Winograd schemas at the stated level didn't fall within 2 years, but very barely so (they fell the year after).  So part of what I'm flailingly asking here, is whether you think you have reliable and sensitive Nope Vision that extends out beyond 2 years, in general, such that you can go on saying "Not for 4 years" up until we are actually within 6 years of the thing, and then, you think, your Nope Vision will actually flash an alert and you will change your tune, before you are actually within 4 years of the thing.  Or maybe you think you've got Nope Vision extending out 6 years?  10 years?  Or maybe theorem-proving is just a special case and usually your Nope Vision would be limited to 2 years or 3 years?

This is all an extremely Yudkowskian frame on things, of course, so feel free to reframe.

Christiano, Cotra, and Yudkowsky on AI progress

I also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?

Yudkowsky and Christiano discuss "Takeoff Speeds"

Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025.  The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.

So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse?  Pretty bare portfolio but it's at least a start in both directions.  If we say 5% instead of 1%, how much further would you extend the time limit out beyond 2024?

I also don't know at all what part of your model forbids theorem-proving to fall in a shocking headline followed by another headline a year later - it doesn't sound like it's from looking at a graph - and I think that explaining reasons behind our predictions in advance, not just making quantitative predictions in advance, will help others a lot here.

EDIT: Though the formal IMO challenge has a barnacle about the AI being open-sourced, which is a separate sociological prediction I'm not taking on.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%.  Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict.  Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%.  That's a separate matter.  I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Ha!  Okay then.  My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available; I frame no separate sociological prediction about whether somebody is willing to open-source the AI model that does it.

Christiano, Cotra, and Yudkowsky on AI progress

Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day.  The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" and then I say "Starting in 2022 would not surprise me" by way of making an antiprediction that contradicts them.  It may sound bold and startling to them, but from my own perspective I'm just expressing my ignorance.  That's one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it - why wait for me to ask you?

If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3's current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning.  We haven't figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we've already applied using the right loss functions.

So there's a qualitative guess at a surface capability we might see soon - but when is "soon"?  I don't know; history suggests that even what predictably happens later is extremely hard to time.  There are subpredictions of the Yudkowskian imagery that you could extract from here, including such minor and perhaps-wrong but still suggestive implications like, "170B weights is probably enough for this first amazing translator, rather than it being a matter of somebody deciding to expend 1.7T (non-MoE) weights, once they figure out the underlying setup and how to apply the gradient descent" and "the architecture can potentially look like somebody Stacked More Layers and like it didn't need key architectural changes like Yudkowsky suspects may be needed to go beyond GPT-3 in other ways" and "once things are sufficiently well understood, it will look clear in retrospect that we could've gotten this translation ability in 2020 if we'd spent compute the right way".

It is, alas, nowhere written in this prophecy that we must see even more un-Paul-ish phenomena, like translation capabilities taking a sudden jump without intermediates.  Nothing rules out a long wandering road to the destination of good translation in which people figure out lots of little things before they figure out a big thing, maybe to the point of nobody figuring out until 20 years later the simple trick that would've gotten it done in 2020, a la ReLUs vs sigmoids.  Nor can I say that such a thing will happen in 2022 or 2025, because I don't know how long it takes to figure out how to do what you clearly ought to be able to do.

I invite you to express a different take on machine translation; if it is narrower, more quantitative, more falsifiable, and doesn't achieve this just by narrowing its focus to metrics whose connection to the further real-world consequences is itself unclear, and then it comes true, you don't need to have explicitly bet against me to have gained more virtue points.

Christiano, Cotra, and Yudkowsky on AI progress

If they've found some way to put a lot more compute into GPT-4 without making the model bigger, that's a very different - and unnerving - development.

Yudkowsky and Christiano discuss "Takeoff Speeds"

(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read.  I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024 - though of course, as events like this lie in the Future, they are very hard to predict.

Can you say more about why or whether you would, in this case, say that this was an un-Paulian set of events?  As I have trouble manipulating my Paul model, it does not exclude Paul saying, "Ah, yes, well, they were using 700M models in that paper, so if you jump to 70B, of course the IMO grand challenge could fall; there wasn't a lot of money there."  Though I haven't even glanced at any metrics here, let alone metrics that the IMO grand challenge could be plotted on, so if smooth metrics rule out IMO in 5yrs, I am more interested yet - it legit decrements my belief, but not nearly as much as I imagine it would decrement yours.

(Edit:  Also, on the meta-level, is this, like, anywhere at all near the sort of thing you were hoping to hear from me?  Am I now being a better epistemic citizen, if maybe not a good one by your lights?)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."  We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there's a very real sense in which, yes, what we're interested in are milestones, and often milestones that aren't easy to define even after the fact.  GPT-2 was shocking, and then GPT-3 carried that shock further in that direction, but how do you talk with that about somebody who thinks that perplexity loss is smooth?  I can handwave statements like "GPT-3 started to be useful without retraining via just prompt engineering" but qualitative statements like those aren't good for betting, and it's much much harder to come up with the right milestone like that in advance, instead of looking back in your rearview mirror afterwards.

But you say - I think? - that you were less shocked by this sort of thing than I am.  So, I mean, can you prophesy to us about milestones and headlines in the next five years?  I think I kept thinking this during our dialogue, but never saying it, because it seemed like such an unfair demand to make!  But it's also part of the whole point that AGI and superintelligence and the world ending are all qualitative milestones like that.  Whereas such trend points as Moravec was readily able to forecast correctly - like 10 teraops / plausibly-human-equivalent-computation being available in a $10 million supercomputer around 2010 - are really entirely unanchored from AGI, at least relative to our current knowledge about AGI.  (They would be anchored if we'd seen other planets go through this, but we haven't.)

Load More