AI Timelines

habryka; Daniel Kokotajlo; Ajeya Cotra; Ege Erdil

Introduction

How many years will pass before transformative AI is built? Three people who have thought about this question a lot are Ajeya Cotra from Open Philanthropy, Daniel Kokotajlo from OpenAI and Ege Erdil from Epoch. Despite each spending at least hundreds of hours investigating this question, they still still disagree substantially about the relevant timescales. For instance, here are their median timelines for one operationalization of transformative AI:

Median Estimate for when 99% of currently fully remote jobs will be automatable
Daniel	4 years
Ajeya	13 years
Ege	40 years

You can see the strength of their disagreements in the graphs below, where they give very different probability distributions over two questions relating to AGI development (note that these graphs are very rough and are only intended to capture high-level differences, and especially aren't very robust in the left and right tails).

In what year would AI systems be able to replace 99% of current fully remote jobs?

In what year will the energy consumption of humanity or its descendants be 1000x greater than now?

Median indicated by small dotted line. Note that Ege's median is outside of the bounds at 2177

So I invited them to have a conversation about where their disagreements lie, sitting down for 3 hours to have a written dialogue. You can read the discussion below, which I personally found quite valuable.

The dialogue is roughly split in two, with the first part focusing on disagreements between Ajeya and Daniel, and the second part focusing on disagreements between Daniel/Ajeya and Ege.

I'll summarize the discussion here, but you can also jump straight in.

Summary of the Dialogue

Some Background on their Models

Ajeya and Daniel are using a compute-centric model for their AI forecasts, illustrated by Ajeya's draft AI Timelines report, and Tom Davidson's takeoff model where the question of "when transformative AI" gets reduced to "how much compute is necessary to get AGI and when will we have that much compute? (modeling algorithmic advances as reductions in necessary compute)".

Whereas Ege thinks such models should have a lot of weight in our forecasts, but that they likely miss important considerations and doesn't have enough evidence to justify the extraordinary predictions it makes.

Habryka's Overview of Ajeya & Daniel discussion

Ajeya thinks translating AI capabilities into commercial applications has gone slower than expected ("it seems like 2023 brought the level of cool products I was naively picturing in 2021") and similarly thinks there will be a lot of kinks to figure out before AI systems can substantially accelerate AI development.
Daniel agrees that impactful commercial applications have been slower than expected, but also thinks that the parts that made that slow can be automated substantially, and that a lot of the complexity comes from shipping something that can be useful to general consumers, and that for applications internal to the company, these capabilities can be unlocked faster.
Compute overhangs also play a big role in the differences between Ajeya and Daniel's timelines. There is currently substantial room to scale up AI by just spending more money on readily available compute. However, within a few years, increasing the amount of training compute further will require accelerating the semiconductor supply chain, which probably can't be easily achieved by just spending more money. This creates a "compute overhang" that accelerates AI progress substantially in the short run. Daniel thinks it's more likely than not that we will get transformative AI before this compute overhang is exhausted. Ajeya thinks that is plausible, but overall it's more likely to happen after, which broadens her timelines quite a bit.

These disagreements probably explain some but not most of the differences in the timelines for Daniel and Ajeya.

Habryka's Overview of Ege & Ajeya/Daniel Discussion

Ege thinks that Daniel's forecast leaves very little room for Hoftstadter's law ("It always takes longer than you expect, even when you take into account Hofstadter's Law"), and in-general that there will be a bunch of unexpected things that go wrong on the path to transformative AI
Daniel thinks that Hofstadter's law is inappropriate for trend extrapolation. I.e. it doesn't make sense to look at Moore's law and be like "ah, and because of planning fallacy the slope of this graph from today is half of what it was previously"
Both Ege and Ajeya don't expect a large increase in transfer learning ability in the next few years. For Ege this matters a lot because it's one of the top reasons why AI will not speed up the economy and AI development that much. Ajeya thinks we can probably speed up AI R&D anyways by making AI that doesn't have transfer as good as humans, but is just really good at ML engineering and AI R&D because it was directly trained to be.
Ege expects that AI will have a large effect on the economy, but has substantial probability on persistent deficiencies that prevent AI from fully automating AI R&D or very substantially accelerating semiconductor progress.

Overall, whether AI will get substantially better at transfer learning (e.g. seeing an AI be trained on one genre of video game and then very quickly learn to play another genre of video game) would update all participants substantially towards shorter timelines.

We ended the dialogue with Ajeya, Daniel and Ege by putting numbers on how much various AGI milestones would cause them to update their timelines (with the concrete milestones proposed by Daniel). Time constraints made it hard to go into as much depth as we would have liked, but me and Daniel are excited about fleshing more concrete scenarios of how AGI could play out and then collecting more data on how people would update in such scenarios.

The Dialogue

habryka

Daniel, Ajeya, and Ege all seem to disagree quite substantially on the question of "how soon is AI going to be a really big deal?". So today we set aside a few hours to try to dig into that disagreement, and what the most likely cruxes between your perspectives might be.

To keep things grounded and to make sure we don't misunderstand each other, we will be starting with two reasonably well-operationalized questions:

In what year would AI systems be able to replace 99% of current fully remote jobs? (With operationalization stolen from an AI forecasting slide deck that Ajeya shared)
In what year will the energy consumption of humanity or its descendants be 1000x greater than now?

These are of course both very far from a perfect operationalization of AI risk (and I think for most people both of these questions are farther off than their answer to "how long are your timelines?"), but my guess is it will be good enough to elicit most of the differences in y'all's models and make it clear that there is indeed disagreement.

Visual probability distributions

habryka

To start us off, here are two graphs of y'all's probability distributions:

**When will 99% of fully remote jobs be automated?**

Opening statements

habryka

Ok, so let's get started:

What is your guess about which belief of yours the other two most disagree with, that might explain some of the divergence in your forecasts?

Daniel

Daniel Kokotajlo

I don't understand Ege's views very well at all yet, so I don't have much to say there. By contrast I have a lot to say about where I disagree with Ajeya. In brief: My training compute requirements distribution is centered a few OOMs lower than Ajeya's is. Why? For many reasons, but (a) I am much less enthusiastic about the comparison to the human brain than she is (or than I used to be!) and (b) I am less enthusiastic about the horizon length hypothesis / I think that large amounts of training on short-horizon tasks combined with small amounts of training on long-horizon tasks will work (after a few years of tinkering maybe).

habryka

Daniel: Just to clarify, it sounds like you approximately agree with a compute-focused approach of AI forecasting? As in, the key variable to forecast is how much compute is necessary to get AGI, with maybe some adjustment for algorithmic progress, but not a ton?

How do things like "AIs get good at different horizon lengths" play into this (which you were also mentioning as one of the potential domains of disagreement)?

(For readers: The horizon length hypothesis is that the longer the the feedback loops for a task are, the harder it is for an AI to get good at this task.

Balancing a broom on one end has feedback loops of less than a second. The task of "running a company" has month-to-year long feedback loops. The hypothesis is that we need much more compute to get AIs that are better at the second than the first. See also Richard Ngo's t-AGI framework which posits that the domain in which AI is generally intelligence will gradually expand from short time horizons to long time horizons.)

Daniel Kokotajlo

Yep I think Ajeya's model (especially the version of it expanded by Davidson & Epoch) is our best current model of AGI timelines and takeoff speeds. I have lots to critique about it but it's basically my starting point. And I am qualified to say this, so to speak, because I actually did consider about a half-dozen different models back in 2020 when I was first starting to research the topic and form my own independent impressions, and the more I thought about it the more I thought the other models were worse. Examples of other models: Polling AI scientists, extrapolating gross world product (GWP) a la Roodman, deferring to what the stock markets imply, Hanson's weird fractional progress model thingy, the semi-informative prior models... I still put some weight on those other models but not much.

As for the horizon lengths question: This feeds into the training compute requirements variable. IIRC Ajeya's original model had different buckets for short, medium, and long-horizon, where e.g. medium-horizon bucket meant roughly "Yeah we'll be doing a combo of short horizon and long horizon training, but on average it'll be medium-horizon training, such that the compute costs will be e.g. [inference FLOPs]*[many trillions of datapoints, as per scaling laws applied to bigger-than-human-brain-models]*[4-6 OOMs of seconds of subjective experience per datapoint on average]

So Ajeya had most of her mass on the medium and long horizon buckets, whereas I was much more bullish that the bulk of the training could be short horizon with just a "cherry on top" of long-horizon. Quantitatively I was thinking something like "Say you have 100T datapoints of next-moment-prediction as part of your short-horizon pre-training. I claim that you can probably get away with merely 100M datapoints of million-second-long tasks, or less."

For some intuitions why I think this, it may help to read this post and/or this comment thread.

Ege

Ege Erdil

I think my specific disagreements with Ajeya and Daniel might be a little different, but an important meta-level point is my general skepticism of arguments that imply wild conclusions. This becomes especially relevant with predictions of a 3 OOM increase in our energy consumption in the next 10 or 20 years. It's possible to tell a compelling story about why that might happen, but also possible to do the opposite, and judging how convincing those arguments should be is difficult for me.

Daniel Kokotajlo

OK, in response to Ege, I guess we disagree about this "that-conclusion-is-wild-therefore-unlikely" factor. I think for things like this it's a pretty poor guide to truth relative to other techniques (e.g. models, debates between people with different views, model-based debates between people with different views) I'm not sure how to make progress on resolving this crux. Ege, you say it's mostly in play for the 1000x energy consumption thing; wanna focus on discussing the other question instead?

Ege Erdil

Sure, discussing the other question first is fine.

I'm not sure why you think heuristics like "I don't update as much on specific arguments because I'm skeptical of my ability to do so" are ineffective, though. For example, this seems like it goes against the fractional Kelly betting heuristic from this post, which I would endorse in general: you want to defer to the market to some extent because your model has a good chance of being wrong.

I don't know if it's worth going down this tangent right now, though, so it's probably more productive to focus on the first question for now.

I think my wider distribution on the first question is also affected by the same high-level heuristic, though to a lesser extent. In some sense, if I were to fully condition on the kind of compute-based model that you and Ajeya seem to have about how AI is likely to develop, I would probably come up with a probability distribution for the first question that more or less agrees with Ajeya's.

habryka

That's interesting. I think digging into that seems good to me.

Can you say a bit more about how you are thinking about it at a high level? My guess is you have a bunch of broad heuristics, some of which are kind of like "well, the market doesn't seem to think AGI is happening soon?", and then those broaden your probability mass, but I don't know whether that's a decent characterization, and would be interested in knowing more of the heuristics that drive that.

Ege Erdil

I'm not sure I would put that much weight on the market not thinking it's happening soon, because I think it's actually fairly difficult to tell what market prices would look like if the market did think it was happening soon.

Setting aside the point about the market and elaborating on the rest of my views: I would give a 50% chance that in 30 years, I will look back on something like Tom Davidson's takeoff model and say "this model captured all or most of the relevant considerations in predicting how AI development was likely to proceed". For me, that's already a fairly high credence to have in a specific class of models in such an uncertain domain.

However, conditional on this framework being importantly wrong, my timelines get substantially longer because I see no other clear path from where we are to AGI if the scaling pathway is not available. There could be other paths (e.g. large amounts of software progress) but they seem much less compelling.

If I thought the takeoff model from Tom Davidson (and some newer versions that I've been working on personally) were basically right, my forecasts would just look pretty similar to the forecasts of that model, and based on my experience with playing around in these models and the parameter ranges I would consider plausible, I think I would just end up agreeing with Ajeya on the first question.

Does that explanation make my view somewhat clearer?

habryka

However, conditional on this framework being importantly wrong, my timelines get substantially longer because I see no other clear path from where we are to AGI if the scaling pathway is not available. There could be other paths (e.g. large amounts of software progress) but they seem much less compelling.

This part really helps, I think.

I would currently characterize your view as "Ok, maybe all we need is to increase compute scaling and do some things that are strictly easier than that (and so will be done by the time we have enough compute). But if that's wrong, forecasting when we'll get AGI gets much harder, since we don't really have any other concrete candidate hypothesis for how to get to AGI, and that implies a huge amount of uncertainty on when things will happen".

Ege Erdil

I would currently characterize your view as "Ok, maybe all we need is to increase compute scaling and do some things that are strictly easier than that (and so will be done by the time we have enough compute). But if that's wrong, forecasting when we'll get AGI gets much harder, since we don't really have any other concrete candidate hypothesis for how to get to AGI, and that implies a huge amount of uncertainty on when things will happen".

That's basically right, though I would add the caveat that entropy is relative so it doesn't really make sense to have a "more uncertain distribution" over when AGI will arrive. You have to somehow pick some typical timescale over which you expect that to happen, and I'm saying that once scaling is out of the equation I would default to longer timescales that would make sense to have for a technology that we think is possible but that we have no concrete plans for achieving on some reasonable timetable.

Ajeya Cotra

I see no other clear path from where we are to AGI if the scaling pathway is not available. There could be other paths (e.g. large amounts of software progress) but they seem much less compelling.

I think it's worth separating the "compute scaling" pathway into a few different pathways, or else giving the generic "compute scaling" pathway more weight because it's so broad. In particular, I think Daniel and I are living in a much more specific world than just "lots more compute will help;" we're picturing agents built from LLMs, more or less. That's very different from e.g. "We can simulate evolution." The compute scaling hypothesis encompasses both, as well as lots of messier in-between worlds. It's pretty much the one paradigm that anyone in the past who was trying to forecast timelines and got anywhere close to predicting when AI would start getting interesting used. Like I think Moravec is looking super good right now. In some sense, "we come up with a brilliant insight to do this way more efficiently than nature even when we have very little compute" is a hypothesis that should have had <50% weight a priori, compared to "capabilities will start getting good when we're talking about macroscopic amounts of compute."

Or maybe I'd say on priors you could have been 50/50 between "things will get more and more interesting the more compute we have access to" and "things will stubbornly stay super uninteresting even if we have oodles of compute because we're missing deep insights that the compute doesn't help us get"; but then when you look around at the world, you should update pretty hard toward the first.

Ajeya

Ajeya Cotra

On Daniel's opening points: I think I actually just agree with both a) and b) right now — or rather, I agree that the right question to ask about the training compute requirement is something more along the lines of "How many GPT-N to GPT-N.5 jumps do we think it would take?", and that short horizon LLMs plus tinkering looks more like "the default" than like "one of a few possibilities," where other possibilities included a more intense meta-learning step (which is how it felt in 2019-20). The latter was the biggest adjustment in my updated timelines.

That said though, I think two important object-level points push the "needed model size" and "needed amount of tinkering" higher in my mind than it is for Daniel:

In-context learning does seem pretty bad, and doesn't seem to be improving a huge amount. I think we can have TAI without really strong human-like in-context learning, but it probably requires more faff than if we had that out of the gate.
Relatedly, adversarial robustness seems not-great right now. This also feels overcomable, but I think it increases the scale that you need (by analogy, like 5-10 years ago it seemed like vision systems were good enough for cars except in the long tail / in adversarial settings, and I think vision systems had to get a fair amount bigger, plus there had to be a lot more tinkering on the cars, to get to now where they're starting to be viable).

And then a meta-level point is that I (and IIRC Metaculus, according to my colleague Isabel) have been kind of surprised for the last few years about the lack of cool products built on LLMs (it seems like 2023 brought the level of cool products I was naively picturing in 2021). I think there's a "reality has a lot of detail, actually making stuff work is a huge pain" dynamic going on, and it lends credence to the "things will probably be fairly continuous" heuristic that I already had.

A few other meta-points:

The Paul self-driving car bets post was interesting to me, and I place some weight on "Daniel is doing the kind of 'I can see how it would be done so it's only a few years away' move that I think doesn't serve as a great guide to what will happen in the real world."
Carl is the person who seems like he's been the most right when we've disagreed, so he's probably the one guy whose views I put the most weight on. But also Carl seems like he errs aggressive and errs in the direction of believing people will aggressively pursue the most optimal thing (being more surprised than I was, for a longer period of time, about how people haven't invested more in AI and accomplished more with it by now). His timelines are longer than Daniel's, and I think mine are a bit longer than his.
In general, I do count Daniel as among a pretty small set of people who were clearly on record with views more correct than mine in 2020 about both the nature of how TAI would be built (LLMs+tinkering) and how quickly things would progress. Although it's a bit complicated because 2020-me thought we'd be seeing more powerful LLM products by now.
Other people who I think were more right include Carl, Jared Kaplan, Danny Hernandez, Dario, Holden, and Paul. Paul is interesting because I think he both put more weight than I did on "it's just LLMs plus a lot of decomposition and tinkering" but also puts more weight than either me or Daniel on "things are just hard and annoying and take a long time;" this left him with timelines similar to mine in 2020, and maybe a bit longer than mine now.
Oh — another point that seems interesting to discuss at some point is that I suspect Daniel generally wants to focus on a weaker endpoint because of some sociological views I disagree with. (Screened off by the fact that we were answering the same question about remotable jobs replacement, but I think hard to totally screen off.)

On in-context learning as a potential crux

Daniel Kokotajlo

Re: Ajeya:

Interesting, I thought the biggest adjustment to your timelines was the pre-AGI R&D acceleration modelled by Davidson. That was another disagreement between us originally that ceased being a disagreement once you took that stuff into account.
re: in-context learning: I don't have much to say on this & am curious to hear more. Why do you think it needs to get substantially better in order to reach AGI, and why do you think it's not on track to do so? I'd bet that GPT4 is way better than GPT3 at in-context learning for example.
re: adversarial robustness: Same question I guess. My hot take would be (a) it's not actually that important, the way forward is not to never make errors in the first place but rather to notice and recover from them enough that the overall massive parallel society of LLM agents moves forward and makes progress, and (b) adversarial robustness is indeed improving. I'd be curious to hear more, perhaps you have data on how fast it is improving and you extrapolate the trend and think it'll still be sucky by e.g. 2030?
re: schlep & incompetence on the part of the AGI industry: Yep, you are right about this, and I was wrong. Your description of Carl also applies to me historically; in the past three years I've definitely been a "this is the fastest way to AGI, therefore at least one of the labs will do it with gusto" kind of guy, and now I see that is wrong. I think basically I fell for the planning fallacy & efficient market hypothesis fallacy.

However, I don't think this is the main crux between us, because it basically pushes things back by a few years, it doesn't e.g. double (on a log scale) the training requirements. My current, updated model of timelines, therefore, is that the bottleneck in the next five years is not necessarily compute but instead quite plausibly schlep & conviction on the part of the labs. This is tbh a bit of a scary conclusion.

Ajeya Cotra

re: in-context learning: I don't have much to say on this & am curious to hear more. Why do you think it needs to get substantially better in order to reach AGI, and why do you think it's not on track to do so? I'd bet that GPT4 is way better than GPT3 at in-context learning for example.

The traditional image of AGI involves having an AI system that can learn new (to it) skills as efficiently as humans (with as few examples as humans would need to see). I think this is not how the first transformative AI system will look, because ML is less sample efficient than humans and it doesn't look like in-context learning is on track to being able to do the kind of fast sample-efficient learning that humans do. I think this is not fatal for getting TAI, because we can make up for it by a) the fact that LLMs' "ancestral memory" contains all sorts of useful information about human disciplines that they won't need to learn in-context, and b) explicitly guiding the LLM agent to "reason out loud" about what lessons it should take away from its observations and putting those in an external memory it retrieves from, or similar.

I think back when Eliezer was saying that "stack more layers" wouldn't get us to AGI, this is one of the kinds of things he was pointing to: that cognitively, these systems didn't have the kind of learning/reflecting flexibility that you'd think of re AGI. When people were talking about GPT-3's in-context learning, I thought that was one of the weakest claims by far about its impressiveness. The in-context learning at the time was like: you give it a couple of examples of translating English to French, and then you give it an English sentence, and it dutifully translate that into French. It already knew English and it already knew French (from its ancestral memory), and the thing it "learned" was that the game it was currently playing was to translate from English to French.

I agree that 4 is a lot better than 3 (for example, you can teach 4 new games like French Toast or Hitler and it will play them — unless it already knows that game, which is plausible). But compared to any object-level skill like coding (many of which are superhuman), in-context learning seems quite subhuman. I think this is related to how ARC Evals' LLM agents kind of "fell over" doing things like setting up Bitcoin wallets.

Like Eliezer often says, humans evolved to hunt antelope on the savannah, and that very same genetics coding for the very same brain can build rockets and run companies. Our LLMs right now generalize further from their training distribution than skeptics in 2020 would have said, and they're generalizing further and further as they get bigger, but they have nothing like the kind of savannah-to-boardroom generalization we have. This can create lots of little issues in lots of places when an LLM will need to digest some new-to-it development and do something intelligent with it. Importantly, I don't think this is going to stop LLM-agent-based TAI from happening, but it's one concrete limitation that pushes me to thinking we'll need more scale or more schlep than it looks like we'll need before taking this into account.

Adversarial robustness, which I'll reply to in another comment, is similar: a concrete hindrance that isn't fatal but is one reason I think we'll need more scale and schlep than it seems like Daniel does (despite agreeing with his concrete counterarguments of the form "we can handle it through X countermeasure").

Daniel Kokotajlo

Re: Ajeya: Thanks for that lengthy reply. I think I'll have to ponder it for a bit. Right now I'm stuck with a feeling that we agree qualitatively but disagree quantitatively.

Ege Erdil

I think it's worth separating the "compute scaling" pathway into a few different pathways, or else giving the generic "compute scaling" pathway more weight because it's so broad. In particular, I think Daniel and I are living in a much more specific world than just "lots more compute will help;" we're picturing agents built from LLMs, more or less. That's very different from e.g. "We can simulate evolution." The compute scaling hypothesis encompasses both, as well as lots of messier in-between worlds.

I think it's fine to incorporate these uncertainties as a wider prior over the training compute requirements, and I also agree it's a reason to put more weight on this broad class of models than you otherwise would, but I still don't find these reasons compelling enough to go significantly above 50%. It just seems pretty plausible to me that we're missing something important, even if any specific thing we can name is unlikely to be what we're missing.

To give one example, I initially thought that the evolution anchor from the Bio Anchors report looked quite solid as an upper bound, but I realized some time after that it doesn't actually have an appropriate anthropic correction and this could potentially mess things up. I now think if you work out the details this correction turns out to be inconsequential, but it didn't have to be like that: this is just a consideration that I missed when I first considered the argument. I suppose I would say I don't see a reason to trust my own reasoning abilities as much as you two seem to trust yours.

The compute scaling hypothesis is much broader, and it's pretty much the one paradigm that anyone in the past who was trying to forecast timelines and got anywhere close to predicting when AI would start getting interesting used. Like I think Moravec is looking super good right now.

My impression is that Moravec predicted in 1988 that we would have AI systems comparable to the human brain in performance around 2010. If this actually happens around 2037 (your median timelines), Moravec's forecast will have been off by around a factor of 2 in terms of the time differential from when he made the forecast. That doesn't seem "super good" to me.

Maybe I'm wrong about exactly what Moravec predicted - I didn't read his book and my knowledge is second-hand. In any event, I would appreciate getting some more detail from you about why you think he looks good.

Or maybe I'd say on priors you could have been 50/50 between "things will get more and more interesting the more compute we have access to" and "things will stubbornly stay super uninteresting even if we have oodles of compute because we're missing deep insights that the compute doesn't help us get"; but then when you look around at the world, you should update pretty hard toward the first.

I agree that if I were considering two models at those extremes, recent developments would update me more toward the former model. However, I don't actually disagree with the abstract claim that "things will get more and more interesting the more compute we have access to" - I expect more compute to make things more interesting even in worlds where we can't get to AGI by scaling compute.

Ege Erdil

I agree that 4 is a lot better than 3 (for example, you can teach 4 new games like French Toast or Hitler and it will play them — unless it already knows that game, which is plausible).

A local remark about this: I've seen a bunch of reports from other people that GPT-4 is essentially unable to play tic-tac-toe, and this is a shortcoming that was highly surprising to me. Given the amount of impressive things it can otherwise do, failing at playing a simple game whose full solution could well be in its training set is really odd.

So while I agree 4 seems better than 3, it still has some bizarre weaknesses that I don't think I understand well.

habryka

Ege: Just to check, GPT-4V (vision model) presumably can play tic-tac-toe easily? My sense is that this is just one of these situations where tokenization and one-dimensionality of text makes something hard, but it's trivial to get the system to learn it if it's in a more natural representation.

Ege Erdil

Just to check, GPT-4V (vision model) presumably can play tic-tac-toe easily?

This random Twitter person says that it can't. Disclaimer: haven't actually checked for myself.

Taking into account government slowdown

habryka

As a quick question, to what degree do y'alls forecasts above take into account governments trying to slow things down and companies intentionally going slower because of risks?

Seems like a relevant dimension that's not obviously reflected in usual compute models, and just want to make sure that's not accidentally causing some perceived divergence in people's timelines.

Daniel Kokotajlo

I am guilty of assuming governments and corporations won't slow things down by more than a year. I think I mostly still endorse this assumption but I'm hopeful that instead they'll slow things down by several years or more. Historically I've been arguing with people who disagreed with me on timelines by decades, not years, so it didn't seem important to investigate this assumption. That said I'm happy to say why I still mostly stand by it. Especially if it turns out to be an important crux (e.g. if Ege or Ajeya think that AGI would probably happen by 2030 absent slowdown)

habryka

That said I'm happy to say why I still mostly stand by it.

Cool, might be worth investigating later if it turns out to be a crux.

Ege Erdil

As a quick question, to what degree do y'alls forecasts above take into account governments trying to slow things down and companies intentionally going slower because of risks?
Seems like a relevant dimension that's not obviously reflected in usual compute models, and just want to make sure that's not accidentally causing some perceived divergence in people's timelines.

Responding to habryka: I do think government regulations, companies slowing down because of risks, companies slowing down because they are bad at coordination, capital markets being unable to allocate the large amounts of capital needed for huge training runs for various reasons, etc. could all be important. However, my general heuristic for thinking about the issue is more "there could be a lot of factors I'm missing" and less "I think these specific factors are going to be very important".

In terms of the impact of capable AI systems, I would give significantly less than even but still non-negligible odds that these kinds of factors end up limiting the acceleration in economic growth to e.g. less than an order of magnitude.

Ajeya Cotra

As a quick question, to what degree do y'alls forecasts above take into account governments trying to slow things down and companies intentionally going slower because of risks?

I include this in a long tail of "things are just slow" considerations, although in my mind it's mostly not people making a concerted effort to slow down because of x-risk, but rather just the thing that happens to any sufficiently important technology that has a lot of attention on it: a lot of drags due to the increasing number of stakeholders, both drags where companies are less blase about releasing products because of PR concerns, and drags where governments impose regulations (which I think they would have in any world, with or without the efforts of x-risk-concerned contingent).

habryka

Slight meta: I am interested in digging in a bit more to find some possible cruxes between Daniel and Ajeya, before going more in-depth between Ajeya and Ege, just to keep the discussion a bit more focused.

Recursive self-improvement and AI's speeding up R&D

habryka

Daniel: Just for my own understanding, you have adjusted the compute-model to account for some amount of R&D speedup as a result of having more AI researchers.

To what degree does that cover classical recursive self-improvements or things in that space? (E.g. AI systems directly modifying their training process or weights or develop their own pre-processing modules?)

Or do you expect a feedback loop that's more "AI systems do research that routes through humans understanding those insights and being in the loop on implementing them to improve the AI systems"?

Daniel Kokotajlo

When all we had was Ajeya's model, I had to make my own scrappy guess at how to adjust it to account for R&D acceleration due to pre-AGI systems. Now we have Davidson's model so I mostly go with that.

It covers recursive-self-improvement as a special case. I expect that to be what the later, steeper part of the curve looks like (basically a million AutoGPTs running in parallel across several datacenters, doing AI research but 10-100x faster than humans would, with humans watching the whole thing from the sidelines clapping as metrics go up); the earlier part of the curve looks more like "every AGI lab researcher has access to a team of virtual engineers that work at 10x speed and sometimes make dumb mistakes" and then the earliest part of the curve is what we are seeing now with copilot and chatgpt helping engineers move slightly faster.

Ajeya Cotra

Interesting, I thought the biggest adjustment to your timelines was the pre-AGI R&D acceleration modelled by Davidson. That was another disagreement between us originally that ceased being a disagreement once you took that stuff into account.

These are entangled updates. If you're focusing on just "how can you accelerate ML R&D a bunch," then it seems less important to be able to handle low-feedback-loop environments quite different from the training environment. By far the biggest reason I thought we might need longer horizon training was to imbue the skill of efficiently learning very new things (see here).

Ajeya Cotra

Right now I'm stuck with a feeling that we agree qualitatively but disagree quantitatively.

I think this is basically right!

Ajeya Cotra

re: adversarial robustness: Same question I guess. My hot take would be (a) it's not actually that important, the way forward is not to never make errors in the first place but rather to notice and recover from them enough that the overall massive parallel society of LLM agents moves forward and makes progress, and (b) adversarial robustness is indeed improving. I'd be curious to hear more, perhaps you have data on how fast it is improving and you extrapolate the trend and think it'll still be sucky by e.g. 2030?

I'll give a less lengthy reply here, since structurally it's very similar to in-context learning, and has the same "agree-qualitatively-but-not-quantitatively" flavor. (For example, I definitely agree that the game is going to be coping with errors and error-correction, not never making errors; we're talking about whether that will take four years or more than four years.)

"Not behaving erratically / falling over on super weird or adversarial inputs" is a higher-level-of-abstraction cognitive skill humans are way better at than LLMs. LLMs are improving at this skill with scale (like all skills), and there are ways to address it with schlep and workflow rearrangements (like all problems), and it's unclear how important it is in the first place. But it's plausibly fairly important, and it seems like their current level is "not amazing," and the trend is super unclear but not obviously going to make it in four years.

In general, when you're talking about "Will it be four years from now or more than four years from now?", uncertainty and FUD on any point (in-context-learning, adversarial robustness, market-efficiency-and-schlep) pushes you toward "more than four years from now" — there's little room for it to push in the other direction.

Ege Erdil

In general, when you're talking about "Will it be four years from now or more than four years from now?", uncertainty and FUD on any point (in-context-learning, adversarial robustness, pushes you toward "more than four years from now"

I'm curious why Ajeya thinks this claim is true for "four years" but not true for "twenty years" (assuming that's an accurate representation of her position, which I'm not too confident about).

Ajeya Cotra

I'm curious why Ajeya thinks this claim is true for "four years" but not true for "twenty years" (assuming that's an accurate representation of her position, which I'm not too confident about).

I don't think it's insane to believe this to be true of 20 years, but I think we have many more examples of big, difficult, society-wide things happening over 20 years than over 4.

Daniel Kokotajlo

Quick comment re: in-context learning and/or low-data learning: It seems to me that GPT-4 is already pretty good at coding, and a big part of accelerating AI R&D seems very much in reach -- like, it doesn't seem to me like there is a 10-year, 4-OOM-training-FLOP gap between GPT4 and a system which is basically a remote-working OpenAI engineer that thinks at 10x serial speed. Even if the research scientists are still human, this would speed things up a lot I think. So while I find the abstract arguments about how LLMs are worse at in-context learning etc. than humans plausible, when I think concretely about AI R&D acceleration it still seems like it's gonna start happening pretty soon, and that makes me also update against the abstract argument a bit.

habryka

So, I kind of want to check an assumption. On a compute-focused worldview, I feel a bit confused about how having additional AI engineers helps that much. Like, maybe this is a bit of a strawman, but my vibe is that there hasn't really been much architectural innovation or algorithmic progress in the last few years, and the dominant speedup has come from pouring more compute into existing architectures (with some changes to deal with the scale, but not huge ones).

Daniel, could you be more concrete about how a 10x AI engineer actually helps develop AGI? My guess is on a 4-year timescale you don't expect it to route through semiconductor supply chain improvements.

And then I want to check what Ajeya thinks here and whether something in this space might be a bit of a crux. My model of Ajeya does indeed think that AI systems in the next few years will be impressive, but not really actually that useful for making AI R&D go better, or at least not like orders of magnitude better.

Ege Erdil

Like, maybe this is a bit of a strawman, but my vibe is that there hasn't really been much architectural innovation or algorithmic progress in the last few years, and the dominant speedup has come from pouring more compute into existing architectures (with some changes to deal with the scale, but not huge ones).

My best guess is that algorithmic progress has probably continued at a rate of around a doubling of effective compute per year, at least insofar as you buy that one-dimensional model of algorithmic progress. Again, model uncertainty is a significant part of my overall view about this, but I think it's not accurate to say there hasn't been much algorithmic progress in the last few years. It's just significantly slower than the pace at which we're scaling up compute so it looks relatively less impressive.

(Daniel, Ajeya +1 this comment)

habryka

I was modeling one doubling a year as approximately not very much, compared to all the other dynamics involved, though of course it matters a bunch over the long run.

Daniel Kokotajlo

Re: Habryka's excellent point about how maybe engineering isn't the bottleneck, maybe compute is instead:

My impression is that roughly half the progress has come from increased compute and the other half from better algorithms. Going forward when I think concretely about the various limitations of current algorithms and pathways to overcome them -- which I am hesitant to go into detail about -- it sure does seem like there are still plenty of low and medium-hanging fruit to pick, and then high-hanging fruit beyond which would take decades for human scientists to get to but which can perhaps be reached much faster during an AI takeoff.

I am on a capabilities team at OpenAI right now and I think that we could be going something like 10x faster if we had the remote engineer thing I mentioned earlier. And I think this would probably apply across most of OpenAI research. This wouldn't accelerate our compute acquisition much at all to be clear, but that won't stop a software singularity from happening. Davidson model backs this up I think -- I'd guess that if you magically change it to keep hardware & compute progress constant, you still get a rapid R&D acceleration, just a somewhat slower one.

I'd think differently if I thought that parameter count was just Too Damn Low, like I used to think. If I was more excited about the human brain size comparison, I might think that nothing short of 100T parameters (trained according to Chinchilla also) could be AGI, and therefore that even if we had a remote engineer thinking at 10x speed we'd just eat up the low-hanging fruit and then stall while we waited for bigger computers to come online. But I don't think that.

Ajeya Cotra

On a compute-focused worldview, I feel a bit confused about how having additional AI engineers helps that much. Like, maybe this is a bit of a strawman, but my vibe is that there hasn't really been much architectural innovation or algorithmic progress in the last few years, and the dominant speedup has come from pouring more compute into existing architectures (with some changes to deal with the scale, but not huge ones).

I think there haven't been flashy paradigm-shifting insights, but I strongly suspect each half-GPT was a hard-won effort on a lot of fronts, including both a lot of mundane architecture improvements (like implementing long contexts in less naive ways that don't incur quadratic cost), a lot of engineering to do the model parallelism and other BS that is required to train bigger models without taking huge GPU utilization hits, and a lot of post-training improvements to make usable nice products.

habryka

Ajeya: What you say seems right, but also the things you say also don't sound like the kind of thing that when you accelerate then 10x, then you get AGI 10x earlier. As you said, a lot of BS required to train large models, a lot of productization, but that doesn't speed up the semiconductor supply chain.

The context length and GPU utilization thing feels most relevant.

Ajeya Cotra

Ajeya: What you say seems right, but also the things you say also don't sound like the kind of thing that when you accelerate then 10x, then you get AGI 10x earlier. As you said, a lot of BS required to train large models, a lot of productization, but that doesn't speed up the semiconductor supply chain.

Yeah, TBC, I think there's a higher bar than Daniel thinks there is to speeding stuff up 10x for reasons like this. I do think that there's algorithm juice, like Daniel says, but I don't think that a system you look at and naively think "wow this is basically doing OAI ML engineer-like things" will actually lead to a full 10x speedup; 10x is a lot.

I think you will eventually get the 10x, and then the 100x, but I'm picturing that happening after some ramp-up where the first ML-engineer-like systems get integrated into workflows, improve themselves, change workflows to make better use of themselves, etc.

Ajeya Cotra

Quick comment re: in-context learning and/or low-data learning: It seems to me that GPT-4 is already pretty good at coding, and a big part of accelerating AI R&D seems very much in reach.

Agree this is the strongest candidate for crazy impacts soon, which is why my two updates of "maybe meta-learning isn't that important and therefore long horizon training isn't as plausibly necessary" and "maybe I should just be obsessed with forecasting when we have the ML-research-engineer-replacing system because after that point progress is very fast" are heavily entangled. (Daniel reacts "+1" to this)

-- like, it doesn't seem to me like there is a 10-year, 4-OOM-training-FLOP gap between GPT4 and a system which is basically a remote OpenAI engineer that thinks at 10x serial speed

I don't know, 4 OOM is less than two GPTs, so we're talking less than GPT-6. Given how consistently I've been wrong about how well "impressive capabilities in the lab" will translate to "high economic value" since 2020, this seems roughly right to me?

Daniel Kokotajlo

I don't know, 4 OOM is less than two GPTs, so we're talking less than GPT-6. Given how consistently I've been wrong about how well "impressive capabilities in the lab" will translate to "high economic value" since 2020, this seems roughly right to me?

I disagree with this update -- I think the update should be "it takes a lot of schlep and time for the kinks to be worked out and for products to find market fit" rather than "the systems aren't actually capable of this." Like, I bet if AI progress stopped now, but people continued to make apps and widgets using fine-tunes of various GPTs, there would be OOMs more economic value being produced by AI in 2030 than today.

And so I think that the AI labs will be using AI remote engineers much sooner than the general economy will be. (Part of my view here is that around the time it is capable of being a remote engineer, the process of working out the kinks / pushing through schlep will itself be largely automatable.)

Ege Erdil

Like, I bet if AI progress stopped now, but people continued to make apps and widgets using fine-tunes of various GPTs, there would be OOMs more economic value being produced by AI in 2030 than today.

I'm skeptical we would get 2 OOMs or more with just the current capabilities of AI systems, but I think even if you accept that, scaling from $1B/yr to $100B/yr is easier than from $100B/yr to $10T/yr. Accelerating AI R&D by 2x seems more like the second change to me, or even bigger than that.

Ajeya Cotra

And so I think that the AI labs will be using AI remote engineers much sooner than the general economy will be. (Part of my view here is that around the time it is capable of being a remote engineer, the process of working out the kinks / pushing through schlep will itself be largely automatable.)

I agree with this

Daniel Kokotajlo

Yeah idk I pulled that out of my ass, maybe 2 OOM is more like the upper limit given how much value there already is. I agree that going from X to 10X is easier than going from 10X to 100X, in general. I don't think that undermines my point though. I disagree with your claim that making AI progress go 2x faster is more like scaling from $100B to $10T-- I think it depends on when it happens! Right now in our state of massive overhang and low-hanging-fruit everywhere, making AI progress go 2x faster is easy.

Also to clarify when I said 10x faster I meant 10x faster algorithmic progress; compute progress won't accelerate by 10x obviously. But what this means is that I think we'll transition from a world where half or more of the progress is coming from scaling compute, to a world where most of the progress is coming from algorithmic improvements / pushing-through-schlep.

Do we expect transformative AI pre-overhang or post-overhang?

habryka

I think a hypothesis I have for a possible crux for a lot of the disagreement between Daniel and Ajeya is something like "will we reach AGI before the compute overhang is over vs. after?".

Like, in as much as we think we are in a compute-overhang situation, there is an extremization that applies to people's timelines where if you we'll get there using just remaining capital and compute, you expect quite short timelines, but if you expect it will require faster chips or substantial algorithmic improvements, you expect longer, and with less probability-mass in-between.

Curious about Daniel and Ajeya answering the question of "what probability do you assign to AGI before we exhausted the current compute overhang vs. after?"

Ajeya Cotra

"what probability do you assign to AGI before we exhausted the current compute overhang vs. after?"

I think there are different extremities of compute overhang. The most extreme one which will be exhausted most quickly is like "previously these companies were training AI systems on what is essentially chump change, and now we're starting to get into a world where it's real money, and soon it will be really real money." I think within 3-4 years we'll be talking tens of billions for a training run; I think the probability we get drop-in replacements for 99% remotable jobs (regardless of whether we've rolled those drop-in replacements out everywhere) by then is something like...25%?

And then after that progress is still pretty compute-centric, but it moves slower because you're spending very real amounts of money, and you're impacting the entire supply chain: you need to build more datacenters which come with new engineering challenges, more chip-printing facilities, more fabs, more fab equipment manufacturing plans, etc.

Daniel Kokotajlo

re: Habryka: Yes we disagree about whether the current overhang is enough. But the cruxes for this are the things we are already discussing.

habryka

re: Habryka: Yes we disagree about whether the current overhang is enough. But the cruxes for this are the things we are already discussing.

Cool, that makes sense. That does seem like it might exaggerate the perceived disagreements between the two of you, when you just look at the graphs, though it's of course still highly decision-relevant to dig deeper into whether this is true or not.

Hofstadter's law in AGI forecasting

Ajeya Cotra

TBC Daniel, I think we differ by a factor of 2 on the probability for your median scenario. I feel like a general structure of our disagreements have been like: you (Daniel) are saying a scenario that makes sense and which I place a lot of weight on, but it seems like there are other scenarios and it seems like your whole timetable leaves little room for Hofstadter's law.

Ege Erdil

I feel like a general structure of our disagreements have been like: you (Daniel) are saying a scenario that makes sense and which I place a lot of weight on, but it seems like there are other scenarios and it seems like your whole timetable leaves little room for Hofstadter's law.

I think this also applies to the disagreement between me and Ajeya.

Daniel Kokotajlo

A thing that would change my mind is if I found other scenarios more plausible. Wanna sketch some?

Regarding Hofstadter's law: A possible crux between us is that you both seem to think it applies on timescales of decades -- a multiplicative factor on timelines -- whereas I think it's more like "add three years." Right?

Ege Erdil

Re: Hofstadter's law: A possible crux between us is that you both seem to think it applies on timescales of decades -- a multiplicative factor on timelines -- whereas I think it's more like "add three years." Right?

Yes, in general, that's how I would update my timelines about anything to be longer, not just AGI. The additive method seems pretty bad to me unless you have some strong domain-specific reason to think you should be making an additive update.

Daniel Kokotajlo

Yes, in general, that's how I would update my timelines about anything to be longer, not just AGI. The additive method seems pretty bad to me unless you have some strong domain-specific reason to think you should be making an additive update.

Excellent. So my reason for doing the additive method is that I think Hofstadter's law / schlep / etc. is basically the planning fallacy, and it applies when your forecast is based primarily on imagining a series of steps being implemented. It does NOT apply when your forecast is based primarily on extrapolating trends. Like, you wouldn't look at a graph of exponential progress in Moore's law or solar power or whatever and then be like "but to account for Hofstadter's Law I will assume things take twice as long as I expect, therefore instead of extrapolating the trend-line straight I will cut its slope by half."

And when it comes to AGI timelines, I think that the shorter-timeline scenarios look more subject to the planning fallacy, whereas the longer-timeline scenarios look more like extrapolating trends.

So in a sense I'm doing the multiplicative method, but only on the shorter worlds. Like, when I say 2027 as my median, that's kinda because I can actually quite easily see it happening in 2025, but things take longer than I expect, so I double it... I'm open to being convinced that I'm not taking this into account enough and should shift my timelines back a few years more; however I find it very implausible that I should add e.g. 15 years to my median because of this.

Summary of where we are at so far and exploring additional directions

habryka

We've been going for a while and it might make sense to take a short step back. Let me try to summarize where we are at:

We've been mostly focusing on the disagreement between Ajeya and Daniel. It seems like one core theme in the discussion has been the degree to which "reality has a lot of detail and kinks need to be figured out before AI systems are actually useful". Ajeya currently thinks that while it is true that AGI companies will have access to these tools earlier, there still will be a lot of stuff to figure out before you actually have a system equivalent to a current OAI engineer. Daniel made a similar update in noticing a larger-than-he-expected delay in the transition from "having all the stuff necessary to make a more capably system, like architecture, compute, training setup" and "actually producing a more capable system".

However, it's also not clear how much this actually explains the differences in the timelines for the two of you.

We briefly touched on compute overhangs being a thing that's very relevant to both of your distributions, in that Daniel assigns substantially higher probability to a very high R&D speed-up before the current overhang is exhausted, which pushes his probability mass a bunch earlier. And correspondingly Ajeya's timelines are pretty sensitive to relatively small changes in compute requirements on the margin, since that would push a bunch of probability mass into the pre-overhang world.

Ajeya Cotra

I'll put in a meta note here that I think it's pretty challenging to argue about a 25% vs a 50% on the Daniel scenario, that is literally one bit of evidence one of us sees that the other doesn't. It seems like Daniel thinks I need stronger arguments/evidence than I have to be at 25% instead of 50%, but it's easy to find one bit somewhere and hard to argue about whether it really is one bit.

Exploring conversational directions

Daniel Kokotajlo

In case interested, here are some possible conversation topics/starters:

(1) I could give a scenario in which AGI happens by some very soon date, e.g. December 2024 or 2026, and then we could talk about what parts of the scenario are most unlikely (~= what parts would cause the biggest updates to us if we observed them happening)

(2) Someone without secrecy concerns (i.e. someone not working at OpenAI, i.e. Ajeya or Ege or Habryka) could sketch what they think they would aim to have built by 2030 if they were in charge of a major AI lab and were gunning for AGI asap. Parameter count, training FLOP, etc. taken from standard projections, but then more details like what the training process and data would look like etc. Then we could argue about what this system would be capable of and what it would be incapable of, e.g. how fast would it speed up AI R&D compared to today.

(2.5) As above except for convenience we use Steinhardt's What will GPT-2030 look like? and factor the discussion into (a) will GPT-2030 be capable of the things he claims it will be capable of, and (b) will that cause a rapid acceleration of AI R&D leading shortly to AGI?

(3) Ege or Ajeya could sketch a scenario in which the year 2035 comes and goes without AGI, despite there being no AI progress slowdown (no ban, no heavy regulation, no disruptive war, etc.). Then I could say why I think such a scenario is implausible, and we could discuss more generally what that world looks like.

Ajeya Cotra

On Daniel's four topics:

(1) I could give a scenario in which AGI happens by some very soon date, e.g. December 2024 or 2026, and then we could talk about what parts of the scenario are most unlikely (~= what parts would cause the biggest updates to us if we observed them happening)

I suspect I'll be like "Yep, seems plausible, and my probability on it coming to pass is 2-5x smaller."

(2) Someone without secrecy concerns (i.e. someone not working at OpenAI, i.e. Ajeya or Ege or Habryka) could sketch what they think they would aim to have built by 2030 if they were in charge of a major AI lab and were gunning for AGI asap. Parameter count, training FLOP, etc. taken from standard projections, but then more details like what the training process and data would look like etc. Then we could argue about what this system would be capable of and what it would be incapable of, e.g. how fast would it speed up AI R&D compared to today.

I could do this if people thought it would be useful.

(2.5) As above except for convenience we use Steinhardt's What will GPT-2030 look like? and factor the discussion into (a) will GPT-2030 be capable of the things he claims it will be capable of, and (b) will that cause a rapid acceleration of AI R&D leading shortly to AGI?

I like this blog post but I feel like it's quite tame compared to what both Daniel and I think is plausible so not sure if it's the best thing to anchor on.

(3) Ege or Ajeya could sketch a scenario in which the year 2035 comes and goes without AGI, despite there being no AI progress slowdown (no ban, no heavy regulation, no disruptive war, etc.). Then I could say why I think such a scenario is implausible, and we could discuss more generally what that world looks like.

I can do this if people thought it would be useful.

Ege's median world

Ege Erdil

My median world looks something like this: we keep scaling compute until we hit training runs at a size of 1e28 to 1e30 FLOP in maybe 5 to 10 years, and after that scaling becomes increasingly difficult because of us running up against supply constraints. Software progress continues but slows down along with compute scaling. However, the overall economic impact of AI continues to grow: we have individual AI labs in 10 years that might be doing on the order of e.g. $30B/yr in revenue.

We also get more impressive capabilities: maybe AI systems can get gold on the IMO in five years, we get more reliable image generation, GPT-N can handle more complicated kinds of coding tasks without making mistakes, stuff like that. So in 10 years AI systems are just pretty valuable economically, but I expect the AI industry to look more like today's tech industry - valuable but not economically transformative.

This is mostly because I don't expect just putting 1e30 FLOP of training compute into a system will be enough to get AI systems that can substitute for humans on most or all tasks of the economy. However, I would not be surprised by a mild acceleration of overall economic growth driven by the impact of AI.

Ajeya Cotra

This is mostly because I don't expect just putting 1e30 FLOP of training compute into a system will be enough to get AI systems that can substitute for humans on most or all tasks of the economy.

To check, do you think that having perfect ems of some productive human would be transformative, a la the Duplicator?

If so, what is the main reason you don't think a sufficiently bigger training run would lead to something of that level of impact? Is this related to the savannah-to-boardroom generalization / human-level learning-of-new things point I raised previously?

Ege Erdil

To check, do you think that having perfect ems of some productive human would be transformative, a la the Duplicator?

Eventually, yes, but even there I expect substantial amounts of delay (median of a few years, maybe as long as a decade) because people won't immediately start using the technology.

If so, what is the main reason you don't think a sufficiently bigger training run would lead to something of that level of impact? Is this related to the savannah-to-boardroom generalization / human-level learning-of-new things point I raised previously?

I think that's an important part of it, yes. I expect the systems we'll have in 10 years will be really good at some things with some bizarre failure modes and domains where they lack competence. My example of GPT-4 not being able to play tic-tac-toe is rather anecdotal, but I would worry about other things of a similar nature when we actually want these systems to replace humans throughout the economy.

Daniel Kokotajlo

Eventually, yes, but even there I expect substantial amounts of delay (median of a few years, maybe as long as a decade) because people won't immediately start using the technology.

Interestingly, I think in the case of ems this is more plausible than in the case of normal AGI. Because normal AGI will be more easily extendible to superhuman levels.

Ajeya Cotra

FWIW I think the kind of AGI you and I are imagining as the most plausible first AGI is pretty janky, and the main way I see it improving stuff is by doing normal ML R&D, not galaxy-brained "editing its own source code by hand" stuff. The normal AI R&D could be done by all the ems too.

(It depends on where the AI is at when you imagine dropping ems into the scenario.)

Daniel Kokotajlo

I agree with that. The jankiness is a point in my favor, because it means there's lots of room to grow by ironing out the kinks.

Daniel Kokotajlo

Overall Ege, thanks for writing that scenario! Here are some questions / requests for elaboration:

(1) So in your median world, when do we finally get to AGI, and what changes between 2030 and then that accounts for the difference?

(2) I take it that in this scenario, despite getting IMO gold etc. the systems of 2030 are not able to do the work of today's OAI engineer? Just clarifying. Can you say more about what goes wrong when you try to use them in such a role? Or do you think that AI R&D will indeed benefit from automated engineers, but that AI progress will be bottlenecked on compute or data or insights or something that won't be accelerating?

(3) What about AI takeover? Suppose an AI lab in 2030, in your median scenario, "goes rogue" and decides "fuck it, let's just deliberately make an unaligned powerseeking AGI and then secretly put it in charge of the whole company." What happens then?

Ege Erdil

(1) So in your median world, when do we finally get to AGI, and what changes between 2030 and then that accounts for the difference?

(2) I take it that in this scenario, despite getting IMO gold etc. the systems of 2030 are not able to do the work of today's remote OAI engineer? Just clarifying. Can you say more about what goes wrong when you try to use them in such a role? Or do you think that AI R&D will indeed benefit from automated engineers, but that AI progress will be bottlenecked on compute or data or insights or something that won't be accelerating?

(3) What about AI takeover? Suppose an AI lab in 2030, in your median scenario, "goes rogue" and decides "fuck it, let's just deliberately make an unaligned powerseeking AGI and then secretly put it in charge of the whole company." What happens then?

(1): I'm sufficiently uncertain about this that I don't expect my median world to be particularly representative of the range of outcomes I consider plausible, especially when it comes to giving a date. What I expect to happen is a boring process of engineering that gradually irons out the kinks of the systems, gradual hardware progress allowing bigger training runs, better algorithms allowing for better in-context learning, and many other similar things. As this continues, I expect to see AIs substituting for humans on more and more tasks in the economy, until at some point AIs become superior to humans across the board.

(2): AI R&D will benefit from AI systems, but they won't automate everything an engineer can do. I think when you try to use the systems in practical situations; they might lose coherence over long chains of thought, or be unable to effectively debug non-performant complex code, or not be able to have as good intuitions about which research directions would be promising, et cetera. In 10 years I fully expect many people in the economy to substantially benefit from AI systems, and AI engineers probably more than most.

(3): I don't think anything notable would happen. I don't believe the AI systems of 2030 will be capable enough to manage an AI lab.

Ajeya Cotra

I think Ege's median world is plausible, just like Daniel's median world; I think my probability on "Ege world or more chill than that" is lower than my probability on "Daniel world or less chill than that." Earlier I said 25% on Daniel-or-crazier, I think I'm at 15% on Ege-or-less-crazy.

Daniel Kokotajlo

Re: the "fuck it" scenario: What I'm interested in here is what skills you think the system would be lacking, that would make it fail. Like right now for example we had a baby version of this with ChaosGPT4, which lacked strategic judgment and also had a very high mistakes-to-ability-to-recover-from-mistakes ratio, and also started from a bad position (being constantly monitored, zero human allies). So all it did was make some hilarious tweets and get shut down.

Ajeya Cotra

Ege, do you think you'd update if you saw a demonstration of sophisticated sample-efficient in-context learning and far-off-distribution transfer?

E.g. suppose some AI system was trained to learn new video games: each RL episode was it being shown a video game it had never seen, and it's supposed to try to play it; its reward is the score it gets. Then after training this system, you show it a whole new type of video game it has never seen (maybe it was trained on platformers and point-and-click adventures and visual novels, and now you show it a first-person-shooter for the first time). Suppose it could get decent at the first-person-shooter after like a subjective hour of messing around with it. If you saw that demo in 2025, how would that update your timelines?

Ege Erdil

Ege, do you think you'd update if you saw a demonstration of sophisticated sample-efficient in-context learning and far-off-distribution transfer?

Yes.

Suppose it could get decent at the first-person-shooter after like a subjective hour of messing around with it. If you saw that demo in 2025, how would that update your timelines?

I would probably update substantially towards agreeing with you.

Daniel Kokotajlo

(1): I'm sufficiently uncertain about this that I don't expect my median world to be particularly representative of the range of outcomes I consider plausible, especially when it comes to giving a date. What I expect to happen is a boring process of engineering which gradually irons out the kinks of the systems, gradual hardware progress allowing bigger training runs, better algorithms allowing for better in-context learning, and many other similar things. As this continues, I expect to see AIs substituting for humans on more and more tasks in the economy, until at some point AIs become superior to humans across the board.

Your median is post-2060 though. So I feel like you need to justify why this boring process of engineering is going to take 30 more years after 2030. Why 30 years and not 300? Indeed, why not 3?

Daniel Kokotajlo

(2): AI R&D will benefit from AI systems, but they won't automate everything an engineer can do. I think when you try to use the systems in practical situations; they might lose coherence over long chains of thought, or be unable to effectively debug non-performant complex code, or not be able to have as good intuitions about which research directions would be promising, et cetera. In 10 years I fully expect many people in the economy to substantially benefit from AI systems, and AI engineers probably more than most.

How much do you think they'll be automating/speeding things up? Can you give an example of a coding task such that, if AIs can do that coding task by, say, 2025, you'll update significantly towards shorter timelines, on the grounds that they are by 2025 doing things you didn't expect to be doable by 2030?

(My position is that all of these deficiencies exist in current systems but (a) will rapidly diminish over the next few years and (b) aren't strong blockers to progress anyway, e.g. even if they don't have good research taste they can still speed things up substantially just by doing the engineering and cutting through the schlep)

Ege Erdil

Your median is post-2060 though. So I feel like you need to justify why this boring process of engineering is going to take 30 more years after 2030. Why 30 years and not 300? Indeed, why not 3?

I don't think it's going to take ~30 (really 40 per the distribution I submitted) years after 2030, that's just my median. I think there's a 1/3 chance it takes more than 75 and 1/5 chance it takes more than 175.

If you're asking me to justify why my median is around 2065, I think this is not really that easy to do as I'm essentially just expressing the betting odds I would accept based on intuition.

Formalizing it is tricky, but I think I could say I don't find it that plausible the problem of building AI is so hard that we won't be able to do it even after 300 years of hardware and software progress. Just the massive scaling up of compute we could get from hardware progress and economic growth over that kind of timescale would enable things that look pretty infeasible over the next 20 or 30 years.

Far-off-distribution transfer

habryka

The Ege/Ajeya point about far-off-distribution transfer seem like an interesting maybe-crux, so let's go into that for a bit.

My guess is Ajeya has pretty high probability that that kind of distribution transfer will happen within the next few years and very likely the next decade?

Ajeya Cotra

Yeah, FWIW I think the savannah-to-boardroom transfer stuff is probably underlying past-Eliezer (not sure about current Eliezer) and also a lot of "stochastic parrot"-style skeptics. I think it's a good point under-discussed by the short timelines crowd, though I don't think it's decisive.

Ajeya Cotra

My guess is Ajeya has pretty high probability that that kind of distribution transfer will happen within the next few years and very likely the next decade?

Actually I'm pretty unsure, and slightly lean toward no. I just think it'll take a lot of hard work to make up for the weaknesses of not having transfer this good. Paul has a good unpublished Google doc titled "Doing without transfer." I think by the time systems are transformative enough to massively accelerate AI R&D, they will still not be that close to savannah-to-boardroom level transfer, but it will be fine because they will be trained on exactly what we wanted them to do for us. (This btw also underlies some lower-risk-level intuitions I have relative to MIRI crowd.)

habryka

Actually I'm pretty unsure, and slightly lean toward no.

Oh, huh, that is really surprising to me. But good to have that clarified.

Ajeya Cotra

Yeah, I just think the way we get our OAI-engineer-replacing-thingie is going to be radically different cognitively than human OAI-engineers, in that it will have coding instincts honed through ancestral memory the way grizzly bears have salmon-catching instincts baked into them through their ancestral memory. For example, if you give it a body, I don't think it'd learn super quickly to catch antelope in the savannah, the way a baby human caveperson could learn to code if you transported them to today.

But it's salient to me that this might just leave a bunch of awkward gaps, since we're trying to make do with systems holistically less intelligent than humans, but just more specialized to coding, writing, and so on. This is why I think the Ege world is plausible.

I also dislike using the term AGI for this reason. (Or rather, I think there is a thing people have in mind by AGI which makes sense, but it will come deep into the Singularity, after the earlier transformative AI systems that are not AGI-in-this-sense.)

Ege Erdil

I also dislike using the term AGI for this reason.

In my median world, the term "AGI" also becomes increasingly meaningless because different ways people have operationalized criteria for what counts as AGI and what doesn't begin to come apart. For example, we have AIs that can pass the Turing test for casual conversation (even if judges can ask about recent events), but these AIs can't be plugged in to do an ordinary job in the economy.

Ajeya Cotra

In my median world, the term "AGI" also becomes increasingly meaningless because different ways people have operationalized criteria for what counts as AGI and what doesn't begin to come apart. For example, we have AIs that can pass the Turing test for casual conversation (even if judges can ask about recent events), but these AIs can't be plugged in to do an ordinary job in the economy.

Yes, I'm very sympathetic to this kind of thing, which is why I like TAI (and it's related to the fact that I think we'll first have grizzly-bears-of-coding, not generally-intelligent-beings). But it bites much less in my view because it's all much more compressed and there's a pretty shortish period of a few years where all plausible things people could mean by AGI are achieved, including the algorithm that has savannah-to-boardroom-level transfer.

A concrete scenario & where its surprises are

Daniel Kokotajlo

We can delete this hook later if no one bites, but in case someone does, here's a scenario I think it would be productive to discuss:

(1) Q1 2024: A bigger, better model than GPT-4 is released by some lab. It's multimodal; it can take a screenshot as input and output not just tokens but keystrokes and mouseclicks and images. Just like with GPT-4 vs. GPT-3.5 vs. GPT-3, it turns out to have new emergent capabilities. Everything GPT-4 can do, it can do better, but there are also some qualitatively new things that it can do (though not super reliably) that GPT-4 couldn't do.

(2) Q3 2024: Said model is fine-tuned to be an agent. It was already better at being strapped into an AutoGPT harness than GPT-4 was, so it was already useful for some things, but now it's being trained on tons of data to be a general-purpose assistant agent. Lots of people are raving about it. It's like another ChatGPT moment; people are using it for all the things they used ChatGPT for but then also a bunch more stuff. Unlike ChatGPT you can just leave it running in the background, working away at some problem or task for you. It can write docs and edit them and fact-check them; it can write code and then debug it.

(3) Q1 2025: Same as (1) all over again: An even bigger model, even better. Also it's not just AutoGPT harness now, it's some more sophisticated harness that someone invented. Also it's good enough to play board games and some video games decently on the first try.

(4) Q3 2025: OK now things are getting serious. The kinks have generally been worked out. This newer model is being continually trained on oodles of data from a huge base of customers; they have it do all sorts of tasks and it tries and sometimes fails and sometimes succeeds and is trained to succeed more often. Gradually the set of tasks it can do reliably expands, over the course of a few months. It doesn't seem to top out; progress is sorta continuous now -- even as the new year comes, there's no plateauing, the system just keeps learning new skills as the training data accumulates. Now many millions of people are basically treating it like a coworker and virtual assistant. People are giving it their passwords and such and letting it handle life admin tasks for them, help with shopping, etc. and of course quite a lot of code is being written by it. Researchers at big AGI labs swear by it, and rumor is that the next version of the system, which is already beginning training, won't be released to the public because the lab won't want their competitors to have access to it. Already there are claims that typical researchers and engineers at AGI labs are approximately doubled in productivity, because they mostly have to just oversee and manage and debug the lightning-fast labor of their AI assistant. And it's continually getting better at doing said debugging itself.

(5) Q1 2026: The next version comes online. It is released, but it refuses to help with ML research. Leaks indicate that it doesn't refuse to help with ML research internally, and in fact is heavily automating the process at its parent corporation. It's basically doing all the work by itself; the humans are basically just watching the metrics go up and making suggestions and trying to understand the new experiments it's running and architectures it's proposing.

(6) Q3 2026 Superintelligent AGI happens, by whatever definition is your favorite. And you see it with your own eyes.

Question: Suppose this scenario happens. What does your credence in "AGI by 2027" look like at each of the 6 stages? E.g. what are the biggest updates, and why?

My own first-pass unconfident answer is:
0 -- 50%
1 -- 50%
2 -- 65%
3 -- 70%
4 -- 90%
5 -- 95%
6 -- 100%

Ajeya Cotra

(3) Q1 2025: Same as (1) all over again: An even bigger model, even better. Also it's not just AutoGPT harness now, it's some more sophisticated harness that someone invented. Also it's good enough to play board games and some video games decently on the first try.

I don't know how much I care about this (not zero), but I think someone with Ege's views should care a lot about how it was trained. Was it trained on a whole bunch of very similar board games and video games? How much of a distance of transfer is this, if savannah to boardroom is 100?

Ege Erdil

FWIW I interpreted this literally: we have some bigger model like chatgpt that can play some games decently on the first try, and conditional on (2) my median world has those games being mostly stuff similar to what it's seen before

so i'm not assuming much evidence of transfer from (2), only some mild amount

habryka

Yeah, let's briefly have people try to give probability estimates here, though my model of Ege feels like the first few stages have a ton of ambiguity in their operationalization, which will make it hard to answer in concrete probabilities.

Ajeya Cotra

+1, I also find the ambiguity makes answering this hard

I'll wait for Ege to answer first.

Ege Erdil

Re: Daniel, according to my best interpretation of his steps:

0 -- 6%
1 -- 6%
2 -- 12%
3 -- 15%
4 -- 30%
5 -- 95%
6 -- 100%

Ajeya Cotra

Okay here's my answer:

0 -- 20%
1 -- 28%
2 -- 37%
3 -- 50%
4 -- 75%
5 -- 87%
6 -- 100%

My updates are spread out pretty evenly because the whole scenario seems qualitatively quite plausible and most of my uncertainty is simply whether it will take more scale or more schlep at each stage than is laid out here (including stuff like making it more reliable for a combo of PR and regulation and usable-product reasons).

Daniel Kokotajlo

Thanks both! I am excited about this for a few reasons. One I think it might help to focus the discussion on the parts of the story that are biggest updates for you (and also on the parts that are importantly ambiguous! I'm curious to hear about those!) and two, because as the next three years unfold, we'll be able to compare what happens to this scenario.

Ege Erdil

unfortunately i think the scenarios are vague enough that as a practical matter it will be tricky to adjudicate or decide whether they've happened or not

Daniel Kokotajlo

I agree, but I still think it's worthwhile to do this. Also this was just a hastily written scenario, I'd love to improve it and make it more precise, and I'm all ears for suggestions!

Ajeya Cotra

Ege, I'm surprised you're at 95% at stage 5, given that stage 5's description is just that AI is doing a lot of AI R&D and leaks suggest it's going fast. If your previous timelines were several decades, then I'd think even with non-god-like AI systems speeding up R&D it should take like a decade?

Ege Erdil

I think once you're at step 5 it's overwhelmingly likely that you already have AGI. The key sentence for me is "it's basically doing all the work by itself" - I have a hard time imagining worlds where an AI can do basically all of the work of running an AI lab by itself but AGI has still not been achieved.

If the AI's role is more limited than this, then my update from 4 to 5 would be much smaller.

Ajeya Cotra

I thought Daniel said it was doing all the ML R&D by itself, and the humans were managing it (the AIs are in the role of ICs and the humans are in the role of managers at a tech company). I don't think it's obvious that just because some AI systems can pretty autonomously do ML R&D, they can pretty autonomously do everything, and I would have expected your view to agree with me more there. Though maybe you think that if it's doing ML R&D autonomously, it must have intense transfer / in-context-learning and so it's almost definitely across-the-board superhuman?

Ege Erdil

If it's only doing the R&D then I would be lower than 95%, and the exact probability I give for AGI just depends on what that is supposed to mean. That's an important ambiguity in the operationalization Daniel gives, in my opinion.

In particular, if you have a system that can somehow basically automate AI R&D but is unable to take over the other tasks involved in running an AI lab, that's something I don't expect and would push me far below the 95% forecast I provided above. In this case, I might only update upwards by some small amount based on (4) -> (5), or maybe not at all.

Overall summary, takeaways and next steps

habryka

Here is a summary of the discussion so far:

Daniel made an argument against Hofstadter's law for trend extrapolation and we discussed the validity of that for a bit.

A key thing that has come up as an interesting crux/observation is that Ege and Ajeya both don't expect a massive increase in transfer learning ability in the next few years. For Ege this matters a lot because it's one of the top reasons why AI will not speed up the economy and AI development that much. Ajeya thinks we can probably speed up AI R&D anyways by making grizzly-bear-like-AI that doesn't have transfer as good as humans, but is just really good at ML engineering and AI R&D because it was directly trained to be.

This makes observing substantial transfer learning a pretty relevant crux for Ege and Ajeya in the next few years/decades. Ege says he'd have timelines more similar to Ajeya's if he observed this.

Daniel and Ajeya both think that the most plausible scenario is grizzly-bear-like AI with subhuman transfer but human-level or superhuman ML engineering skills, but while Daniel thinks it'll be relatively fast to work with the grizzly-bear-AIs to massively accelerate R&D, Ajeya thinks that the lower-than-human level "general intelligence" / "transfer" will be a hindrance in a number of little ways, making her think it's plausible we'll need bigger models and/or more schlep. If Ajeya saw extreme transfer work out, she'd update more toward thinking everything will be fast and easy, and thus have Daniel-like timelines (even though Daniel himself doesn't consider extreme transfer to be a crux for him.)

Daniel and Ege tried to elicit what concretely Ege expects to happen over the coming decades when AI progress continues but doesn't end up that transformative. Ege expects that AI will have a large effect on the economy, but assigns a substantial amount of probability on persistent deficiencies that prevent it from fully automating AI R&D or very substantially accelerating semiconductor progress.

(Ajeya, Daniel and Ege all thumbs-up this summary)

Ajeya Cotra

Okay thanks everyone, heading out!

habryka

Thank you Ajeya!

Daniel Kokotajlo

Yes thanks Ajeya Ege and Oliver! Super fun.

habryka

Thinking about future discussions on this topic, I think putting probabilities on the scenario that Daniel outlined was a bit hard given the limited time we had, but I quite like the idea of doing a more parallelized and symmetric version of this kind of thing where multiple participants output a concrete sequence of events, and then have other people forecast how they would update on each of those observations, which does seem like a fun way to elicit disagreements and cruxes.

[-]ryan_greenblatt1y*2422Review for 2023 Review

My sense is that this post holds up pretty well. Most of the considerations under discussion still appear live and important including: in-context learning, robustness, whether jank AI R&D accelerating AIs can quickly move to more general and broader systems, and general skepticism of crazy conclusions.

At the time of this dialogue, my timelines were a bit faster than Ajeya's. I've updated toward the views Daniel expresses here and I'm now about half way between Ajeya's views in this post and Daniel's (in geometric mean).

My read is that Daniel looks somewhat too aggressive in his predictions for 2024, though it is a bit unclear exactly what he was expecting. (This concrete scenario seems substantially more bullish than what we've seen in 2024, but not by a huge amount. It's unclear if he was intending these to be mainline predictions or a 25th percentile bullish scenario.)

AI progress appears substantially faster than the scenario outlined in Ege's median world. In particular:

On "we have individual AI labs in 10 years that might be doing on the order of e.g. $30B/yr in revenue". OpenAI made $4 billion in revenue in 2024 and based on historical trends it looks like AI company revenue goes up 3x per year such that in 2026 the naive trend extrapolation indicates they'd make around $30 billion. So, this seems 3 years out instead of 10.
On "maybe AI systems can get gold on the IMO in five years". We seem likely to see gold on IMO this year (a bit less than 2 years later).

It would be interesting to hear how Daniel, Ajeya, and Ege's views have changed since the time this was posted. (I think Daniel has somewhat later timelines (but the update is smaller than the progression of time such that AGI now seems closer to Daniel) and I think Ajeya has somewhat sooner timelines.)

Daniel discusses various ideas for how to do a better version of this dialogue in this comment. My understanding is that Daniel (and others) have run something similar to what he describes multiple times and participants find this valuable. I'm not sure how much people have actually changed their mind. Prototyping this approach for Daniel is plausibly the most important impact of this dialogue.

[-]Ajeya Cotra1y358

I agree the discussion holds up well in terms of the remaining live cruxes. Since this exchange, my timelines have gotten substantially shorter. They're now pretty similar to Ryan's (they feel a little bit slower but within the noise from operationalizations being fuzzy; I find it a bit hard to think about what 10x labor inputs exactly looks like).

The main reason they've gotten shorter is that performance on few-hour agentic tasks has moved almost twice as fast as I expected, and this seems broadly non-fake (i.e. it seems to be translating into real world use with only a moderate lag rather than a huge lag), though this second part is noisier and more confusing.

This dialogue occurred a few months after METR released their pilot report on autonomous replication and adaptation tasks. At the time it seemed like agents (GPT-4 and Claude 3 Sonnet iirc) were starting to be able to do tasks that would take a human a few minutes (looking something up on Wikipedia, making a phone call, searching a file system, writing short programs).

Right around when I did this dialogue, I launched an agent benchmarks RFP to build benchmarks testing LLM agents on many-step real-world tasks. Through this RFP, in late-2023 and early-2024, we funded a bunch of agent benchmarks consisting of tasks that take experts between 15 minutes and a few hours.

Roughly speaking, I was expecting that the benchmarks we were funding would get saturated around early-to-late 2026 (within 2-3 years). By EOY 2024 (one year out), I had expected these benchmarks to be halfway toward saturation — qualitatively I guessed that agents would be able to reliably perform moderately difficult 30 minute tasks as well as experts in a variety of domains but struggle with the 1-hour-plus tasks. This would have roughly been the same trajectory that the previous generation of benchmarks followed: e.g. MATH was introduced in Jan 2021, got halfway there in June 2022 (1.5 years), then saturated probably like another year after that (for a total of 2.5 years).

Instead, based on agent benchmarks like RE Bench and CyBench and SWE Bench Verified and various bio benchmarks, it looks like agents are already able to perform self-contained programming tasks that would take human experts multiple hours (although they perform these tasks in a more one-shot way than human experts perform them, and I'm sure there is a lot of jaggedness); these benchmarks seem on track to saturate by early 2025. If that holds up, it'd be about twice as fast as I would have guessed (1-1.5 years vs 2-3 years).

There's always some lag between benchmark performance and real world use, and it's very hard for me to gauge this lag myself because it seems like AI agents are way disproportionately useful to programmers and ML engineers compared to everyone else. But from friends who use AI systems regularly, it seems like they are regularly assigning agents tasks that would take them between a few minutes and an hour and getting actual value out of them.

On a meta level I now defer heavily to Ryan and people in his reference class (METR and Redwood engineers) on AI timelines, because they have a similarly deep understanding of the conceptual arguments I consider most important while having much more hands-on experience with the frontier of useful AI capabilities (I still don't use AI systems regularly in my work). Of course AI company employees have the most hands-on experience, but I've found that they don't seem to think as rigorously about the conceptual arguments, and some of them have a track record of overshooting and predicting AGI between 2020 and 2025 (as you might expect from their incentives and social climate).

[-]Ajeya Cotra1y1712

One thing that I think is interesting, which doesn't affect my timelines that much but cuts in the direction of slower: once again I overestimated how much real world use anyone who wasn't a programmer would get. I definitely expected an off-the-shelf agent product that would book flights and reserve restaurants and shop for simple goods, one that worked well enough I would actually use it (and I expected that to happen before the one hour plus coding tasks were solved; I expected it to be concurrent with half hour coding tasks).

I can't tell if the fact that AI agents continue to be useless to me is a portent that the incredible benchmark performance won't translate as well as the bullish people expect to real world acceleration; I'm largely deferring to the consensus in my local social circle that it's not a big deal. My personal intuitions are somewhat closer to what Steve Newman describes in this comment thread.

It seems like anecdotally folks are getting like +5%-30% productivity boost from using AI; it does feel somewhat aggressive for that to go to 10x productivity boost within a couple years.

[-]Buck1y1211

Of course AI company employees have the most hands-on experience

FWIW I am not sure this is right--most AI company employees work on things other than "try to get as much work as possible from current AI systems, and understand the trajectory of how useful the AIs will be". E.g. I think I have more personal experience with running AI agents than people at AI companies who don't actively work on AI agents.

There are some people at AI companies who work on AI agents that use non-public models, and those people are ahead of the curve. But that's a minority.

[-]Ajeya Cotra1y43

Yeah, good point, I've been surprised by how uninterested the companies have been in agents.

[-]Buck1y80

Another effect here is that the AI companies often don't want to be as reckless as I am, e.g. letting agents run amok on my machines.

[-]Ajeya Cotra1y66

Interestingly, I've heard from tons of skeptics I've talked to (e.g. Tim Lee, CSET people, AI Snake Oil) that timelines to actual impacts in the world (such as significant R&D acceleration or industrial acceleration) are going to be way longer than we say because AIs are too unreliable and risky, therefore people won't use them. I was more dismissive of this argument before but:

It matches my own lived experience (e.g. I still use search way more than LLMs, even to learn about complex topics, because I have good Google Fu and LLMs make stuff up too much).
As you say, it seems like a plausible explanation for why my weird friends make way more use out of coding agents than giant AI companies.

[-]Daniel Kokotajlo1y54

I tentatively remain dismissive of this argument. My claim was never "AIs are actually reliable and safe now" such that your lived experience would contradict it. I too predicted that AIs would be unreliable and risky in the near-term. My prediction is that after the intelligence explosion the best AIs will be reliable and safe (insofar as they want to be, that is.)

...I guess just now I was responding to a hypothetical interlocutor who agrees that AI R&D automation could come soon but thinks that that doesn't count as "actual impacts in the world." I've met many such people, people who think that software-only singularity is unlikely, people who like to talk about real-world bottlenecks, etc. But you weren't describing such a person, you were describing someone who also thinks we won't be able to automate AI R&D for a long time.

There I'd say... well, we'll see. I agree that AIs are unreliable and risky and that therefore they'll be able to do impressive-seeming stuff that looks like they could automate AI R&D well before they actually automate AI R&D in practice. But... probably by the end of 2025 they'll be hitting that first milestone (imagine e.g. an AI that crushes RE-Bench and also can autonomously research & write ML papers, except the ML papers are often buggy and almost always banal / unimportant, and the experiments done to make them had a lot of bugs and wasted compute, and thus AI companies would laugh at the suggestion of putting said AI in charge of a bunch of GPUs and telling it to cook.) And then two years later maybe they'll be able to do it for real, reliably, in practice, such that AGI takeoff happens.

Maybe another thing I'd say is "One domain where AIs seem to be heavily used in practice, is coding, especially coding at frontier AI companies (according to friends who work at these companies and report fairly heavy usage). This suggests that AI R&D automation will happen more or less on schedule."

[-]Ajeya Cotra1y40

I'm not talking about narrowly your claim; I just think this very fundamentally confuses most people's basic models of the world. People expect, from their unspoken models of "how technological products improve," that long before you get a mind-bendingly powerful product that's so good it can easily kill you, you get something that's at least a little useful to you (and then you get something that's a little more useful to you, and then something that's really useful to you, and so on). And in fact that is roughly how it's working — for programmers, not for a lot of other people.

Because I've engaged so much with the conceptual case for an intelligence explosion (i.e. the case that this intuitive model of technology might be wrong), I roughly buy it even though I am getting almost no use out of AIs still. But I have a huge amount of personal sympathy for people who feel really gaslit by it all.

[-]Ajeya Cotra1y1010

To put it another way: we probably both agree that if we had gotten AI personal assistants that shop for you and book meetings for you in 2024, that would have been at least some evidence for shorter timelines. So their absence is at least some evidence for longer timelines. The question is what your underlying causal model was: did you think that if we were going to get superintelligence by 2027, then we really should see personal assistants in 2024? A lot of people strongly believe that, you (Daniel) hardly believe it at all, and I'm somewhere in the middle.

If we had gotten both the personal assistants I was expecting, and the 2x faster benchmark progress than I was expecting, my timelines would be the same as yours are now.

[-]Daniel Kokotajlo1y42

That's reasonable. Seems worth mentioning that I did make predictions in What 2026 Looks Like, and eyeballing them now I don't think I was saying that we'd have personal assistants that shop for you and book meetings for you in 2024, at least not in a way that really works. (I say at the beginning of 2026 "The age of the AI assistant has finally dawned.") In other words I think even in 2021 I was thinking that widespread actually useful AI assistants would happen about a year or two before superintelligence. (Not because I have opinions about the orderings of technologies in general, but because I think that once an AGI company has had a popular working personal assistant for two years they should be able to figure out how to make a better version that dramatically speeds up their R&D.)

[-]maxnadeau1y50

You mentioned CyBench here. I think CyBench provides evidence against the claim "agents are already able to perform self-contained programming tasks that would take human experts multiple hours". AFAIK, the most up-to-date CyBench run is in the joint AISI o1 evals. In this study (see Table 4.1, and note the caption), all existing models (other than o3, which was not evaluated here) succeed on 0/10 attempts at almost all the Cybench tasks that take >40 minutes for humans to complete.

[-]elifland1y4-1

I believe Cybench first solve times are based on the fastest top professional teams, rather than typical individual CTF competitors or cyber employees, for which the time to complete would probably be much higher (especially for the latter).

[-]maxnadeau1y40

Do you think that cyber professionals would take multiple hours to do the tasks with 20-40 min first-solve times? I'm intuitively skeptical.

One (edit: minor) component of my skepticism is that someone told me that the participants in these competitions are less capable than actual cyber professionals, because the actual professionals have better things to do than enter competitions. I have no idea how big that selection effect is, but it at least provides some countervailing force against the selection effect you're describing.

[-]Neel Nanda1y52

I don't know much about CTF specifically, but based on my maths exam/olympiad experience I predict that there's a lot of tricks to go fast (common question archetypes, saved code snippets, etc) that will be top of mind for people actively practicing, but not for someone with a lot of domain expertise who doesn't explicitly practice CTF. I also don't know how important speed is for being a successful cyber professional. They might be able to get some of this speed up with a bit of practice, but I predict by default there's a lot of room for improvement.

[-]elifland1y52

Yes, that would be my guess, medium confidence.

One component of my skepticism is that someone told me that the participants in these competitions are less capable than actual cyber professionals, because the actual professionals have better things to do than enter competitions. I have no idea how big that selection effect is, but it at least provides some countervailing force against the selection effect you're describing.

I'm skeptical of your skepticism. Not knowing basically anything about the CTF scene but using the competitive programming scene as an example, I think the median competitor is much more capable than the median software engineering professional, not less. People like competing at things they're good at.

[-]Daniel Kokotajlo1y124

That concrete scenario was NOT my median prediction. Sorry, I should have made that more clear at the time. It was genuinely just a thought experiment for purposes of eliciting people's claims about how they would update on what kinds of evidence. My median AGI timeline at the time was 2027 (which is not that different from the scenario, to be clear! Just one year delayed basically.)

To answer your other questions:
--My views haven't changed much. Performance on the important benchmarks (agency tasks such as METR's RE-Bench) has been faster than I expected for 2024, but the cadence of big new foundation models seems to be slower than I expected (no GPT-5; pretraining scaling is slowing down due to data wall apparently? I thought that would happen more around GPT-6 level). I still have 2027 as my median year for AGI.
--Yes, I and others have run versions of that exercise several times now and yes people have found it valuable. The discussion part, people said, was less valuable than the "force yourself to write out your median scenario" part, so in more recent iterations we mostly just focused on that part.

[-]Ege Erdil11mo70

I think overall things have been moving faster than I've expected, though only in some dimensions than others. The point about revenue is particularly salient to me and I would now put the complete automation of remotable jobs 30 years out in my median world instead of 40 years out.

Progress on long context coherence, agency, executive function, etc. remains fairly "on trend" despite the acceleration of progress in reasoning and AI systems currently being more useful than I expected, so I don't update down by 2x or 3x (which is more like the speedup we've seen relative to my math or revenue growth expectations).

[-]Daniel Kokotajlo11mo41

So your median for the complete automation of remotable jobs is 2055?

What about for the existence of AI systems which can completely automate AI software R&D? (So, filling the shoes of the research engineers and research scientists etc. at DeepMind, the members of technical staff at OpenAI, etc.)

What about your 10th percentile, instead of your median?

According to METR, if I recall correctly, 50%-horizon length of LLM-based AI systems has been doubling roughly every 200 days for several years, and seems to if anything be accelerating recently. And it's already at 40 minutes. So in, idk, four years, if trends continue, AIs should be able to show up and do a day's work of autonomous research or coding as well as professional humans.* (And that's assuming an exponential trend, whereas it'll have to be superexponential eventually. Though of course investment in AI scaling will also be petering out in a few years maybe.)

*A caveat here is that their definition is not "For tasks humans do that take x duration, AI can do them just as well" but rather "For tasks AIs can do with 50% reliability, humans take x duration to do them" which feels different and worse to me in ways I should think about more.

[-]Daniel Kokotajlo2y1913

I had a nice conversation with Ege today over dinner, in which we identified a possible bet to make! Something I think will probably happen in the next 4 years, that Ege thinks will probably NOT happen in the next 15 years, such that if it happens in the next 4 years Ege will update towards my position and if it doesn't happen in the next 4 years I'll update towards Ege's position.

Drumroll...

I (DK) have lots of ideas for ML experiments, e.g. dangerous capabilities evals, e.g. simple experiments related to paraphrasers and so forth in the Faithful CoT agenda. But I'm a philosopher, I don't code myself. I know enough that if I had some ML engineers working for me that would be sufficient for my experiments to get built and run, but I can't do it by myself.

When will I be able to implement most of these ideas with the help of AI assistants basically substituting for ML engineers? So I'd still be designing the experiments and interpreting the results, but AutoGPT5 or whatever would be chatting with me and writing and debugging the code.

I think: Probably in the next 4 years. Ege thinks: probably not in the next 15.

Ege, is this an accurate summary?

[-]Daniel Kokotajlo2y1322

Here's a sketch for what I'd like to see in the future--a better version of the scenario experiment done above:

2-4 people sit down for a few hours together.
For the first 1-3 hours, they each write a Scenario depicting their 'median future' or maybe 'modal future.' The scenarios are written similarly to the one I wrote above, with dated 'stages.' The scenarios finish with superintelligence, or else it-being-clear-superintelligence-is-many-decades-away-at-least.
As they write, they also read over each other's scenarios and ask clarifying questions. E.g. "You say that in 2025 they can code well but unreliably -- what do you mean exactly? How much does it improve the productivity of, say, OpenAI engineers?"
By the end of the period, the scenarios are finished & everyone knows roughly what each stage means because they've been able to ask clarifying questions.
Then for the next hour or so, they each give credences for each stage of each scenario. Credences in something like "ASI by year X" where X is the year ASI happens in the scenario.
They also of course discuss and critique each other's credences, and revise their own.
At the end, hopefully some interesting movements will have happened in people's mental models and credences, and hopefully some interesting cruxes will have surfaced -- e.g. it'll be more clear what kinds of evidence would actually cause timelines updates, were they to be observed.
The scenarios, credences, and maybe a transcript of the discussion then gets edited and published.

[-]kave2y55

Curated. I feel like over the last few years my visceral timelines have shortened significantly. This is partly in contact with LLMs, particularly their increased coding utility, and a lot downstream of Ajeya's and Daniel's models and outreach (I remember spending an afternoon on an arts-and-crafts 'build your own timeline distribution' that Daniel had nerdsniped me with). I think a lot of people are in a similar position and have been similarly influenced. It's nice to get more details on those models and the differences between them, as well as to hear Ege pushing back with "yeah but what if there are some pretty important pieces that are missing and won't get scaled away?", which I hear from my environment much less often.

There are a couple of pieces of extra polish that I appreciate. First, having some specific operationalisations with numbers and distributions up-front is pretty nice for grounding the discussion. Second, I'm glad that there was a summary extracted out front, as sometimes the dialogue format can be a little tricky to wade through.

On the object level, I thought the focus on schlep in the Ajeya-Daniel section and slowness of economy turnover in the Ajaniel-Ege section was pretty interesting. I think there's a bit of a cycle with trying to do complicated things like forecast timelines, where people come up with simple compelling models that move the discourse a lot and sharpen people's thinking. People have vague complaints that the model seems like it's missing something, but it's hard to point out exactly what. Eventually someone (often the person with the simple model) is able to name one of the pieces that is missing, and the discourse broadens a bit. I feel like schlep is a handle that captures an important axis that all three of our participants differ on.

I agree with Daniel that a pretty cool follow-up activity would be an expanded version of the exercise at the end with multiple different average worlds.

[-]Vladimir_Nesov2y*5-1

Subjectively there is clear improvement between 7b vs. 70b vs. GPT-4, each step 1.5-2 OOMs of training compute. The 70b models are borderline capable of following routine instructions to label data or pour it into specified shapes. GPT-4 is almost robustly capable of that. There are 3-4 more effective OOMs in the current investment scaling sprint (3-5 years), so another 2 steps of improvement if there was enough equally useful training data to feed the process, which there isn't. At some point, training gets books in images that weren't previously available as high quality text, which might partially compensate for running out of text data. Perhaps there are 1.5 steps of improvement over GPT-4 in total despite the competence-dense data shortage. (All of this happens too quickly to be restrained by regulation, and without AGI never becomes more scary than useful.)

Leela Zero is a 50m parameter model that plays superhuman Go, a product of quality of its synthetic dataset. Just as with images, sound, natural languages, and programming languages, we can think of playing Go and writing formal proofs as additional modalities. A foundational model that reuses circuits between modalities would be able to take the competence where synthetic data recipes are known, and channel it to better reasoning in natural language, understanding human textbooks and papers, getting closer to improving the quality of its natural language datasets. Competence at in-context learning or sample efficiency during pre-training are only relevant where the system is unable to do real work on its own, the reason essential use of RL can seem necessary for AGI. But once a system is good enough to pluck the low hanging R&D fruit around contemporary AI architectures, these obstructions are gone. (Productively tinkering with generalized multimodality and synthetic data doesn't require going outside the scale of preceding models, which keeps existing regulation too befuddled to intervene.)

[-]Tao Lin2y31

Leela Zero uses MCTS, it doesnt play superhuman in one forward pass (like gpt-4 can do in some subdomains) (i think, didnt find any evaluations of Leela Zero at 1 forward pass), and i'd guess that the network itself doesnt contain any more generalized game playing circuitry than an llm, it just has good intuitions for Go.

Nit:

Subjectively there is clear improvement between 7b vs. 70b vs. GPT-4, each step 1.5-2 OOMs of training compute.

1.5 to 2 OOMs? 7b to 70b is 1 OOM of compute, adding in chinchilla efficiency would make it like 1.5 OOMs of effective compute, not 2. And llama 70b to gpt-4 is 1 OOM effective compute according to openai naming - llama70b is about as good as gpt-3.5. And I'd personally guess gpt4 is 1.5 OOMs effective compute above llama70b, not 2.

[-]Vladimir_Nesov2y*22

Leela Zero uses MCTS, it doesnt play superhuman in one forward pass

Good catch, since the context from LLMs is performance in one forward pass, the claim should be about that, and I'm not sure it's superhuman without MCTS. I think the intended point survives this mistake, that is it's a much smaller model than modern LLMs that has relatively very impressive performance primarily because of high quality of the synthetic dataset it effectively trains on. Thus models at the scale of near future LLMs will have a reality-warping amount of dataset quality overhang. This makes ability of LLMs to improve datasets much more impactful than their competence at other tasks, hence the anchors of capability I was pointing out were about labeling and rearranging data according to instructions. And also makes compute threshold gated regulation potentially toothless.

1.5 to 2 OOMs? 7b to 70b is 1 OOM of compute, adding in chinchilla efficiency would make it like 1.5 OOMs of effective compute, not 2.

With Chinchilla scaling, compute is square of model size, so 2 OOMs under that assumption. Of course current 7b models are overtrained compared to Chinchilla (all sizes of LLaMA-2 are trained on 2T tokens), which might be your point. And Mistral-7b is less obviously a whole step below LLaMA-2-70b, so the full-step-of-improvement should be about earlier 7b models more representative of how the frontier of scaling advances, where a Chinchilla-like tradeoff won't yet completely break down, probably preserving data squared compute scaling estimate (parameter count no longer works very well as an anchor with all the MoE and sparse pre-training stuff). Not clear what assumptions make it 1.5 OOMs instead of either 1 or 2, possibly Chinchilla-inefficiency of overtraining?

And llama 70b to gpt-4 is 1 OOM effective compute according to openai naming - llama70b is about as good as gpt-3.5.

I was going from EpochAI estimate that puts LLaMA 2 at 8e23 FLOPs and GPT-4 at 2e25 FLOPs, which is 1.4 OOMs. I'm thinking of effective compute in terms of compute necessary for achieving the same pre-training loss (using lower amount of literal compute with pre-training algorithmic improvement), not in terms of meaningful benchmarks for fine-tunes. In this sense overtrained smaller LLaMAs get even less effective compute than literal compute, since they employ it to get loss Chinchilla-inefficiently. We can then ask the question of how much subjective improvement a given amount of pre-training loss scaling (in terms of effective compute) gets us. It's not that useful in detail, but gives an anchor for improvement from scale alone in the coming years, before industry and economy force a slowdown (absent AGI): It goes beyond GPT-4 about as far as GPT-4 is beyond LLaMA-2-13b.

[-]Buck2y22

Iirc, original alphago had a policy network that was grandmaster level but not superhuman without MCTS.

[-]Ege Erdil2y*40

This is not quite true. Raw policy networks of AlphaGo-like models are often at a level around 3 dan in amateur rankings, which would qualify as a good amateur player but nowhere near the equivalent of grandmaster level. If you match percentiles in the rating distributions, 3d in Go is perhaps about as strong as an 1800 elo player in chess, while "master level" is at least 2200 elo and "grandmaster level" starts at 2500 elo.

Edit: Seems like policy networks have improved since I last checked these rankings, and the biggest networks currently available for public use can achieve a strength of possibly as high as 6d without MCTS. That would be somewhat weaker than a professional player, but not by much. Still far off from "grandmaster level" though.

[-]Buck2y50

According to figure 6b in "Mastering the Game of Go without Human Knowledge", the raw policy network has 3055 elo, which according to this other page (I have not checked that these Elos are comparable) makes it the 465th best player. (I don’t know much about this and so might be getting the inferences wrong, hopefully the facts are useful)

[-]davidconrad2y40

I found the discussion around Hofstadter's law in forecasting to be really useful as I've definitely found myself and others adding fudge factors to timelines to reflect unknown unknowns which may or may not be relevant when extrapolating capabilities from compute.

In my experience many people are of the feeling that current tools are primarily limited by their ability to plan and execute over longer time horizons. Once we have publicly available tools that are capable of carrying out even simple multi-step plans (book me a great weekend away with my parents with a budget of $x and send me the itinerary), I can see timelines amongst the general public being dramatically reduced.

[-]Daniel Kokotajlo2y46

I think unknown unknowns are a different phenomenon than Hofstadter's Law / Planning Fallacy. My thinking on unknown unknowns is that they should make people spread out their timelines distribution, so that it has more mass later than they naively expect, but also more mass earlier than they naively expect. (Just as there are unknown potential blockers, there are unknown potential accelerants.) Unfortunately I think many people just do the former and not the latter, and this is a huge mistake.

[-]davidconrad2y22

Interesting. I fully admit most of my experience with unknown unknowns comes from either civil engineering projects or bringing consumer products to market, both situations where the unknown unknowns are disproportionately blockers. But this doesn't seem to be the case with things like Moore's Law or continual improvements in solar panel efficiency where the unknowns have been relatively evenly distributed or even weighted towards being accelerants. I'd love to know if you have thoughts on what makes a given field more likely to be dominated by blockers or accelerants!

[-]Hoagy2y45

Could you elaborate on what it would mean to demonstrate 'savannah-to-boardroom' transfer? Our architecture was selected for in the wilds of nature, not our training data. To me it seems that when we use an architecture designed for language translation for understanding images we've demonstrated a similar degree of transfer.

I agree that we're not yet there on sample efficient learning in new domains (which I think is more what you're pointing at) but I'd like to be clearer on what benchmarks would show this. For example, how well GPT-4 can integrate a new domain of knowledge from (potentially multiple epochs of training on) a single textbook seems a much better test and something that I genuinely don't know the answer to.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

81

AI Timelines

81

Introduction

Summary of the Dialogue

Some Background on their Models

Habryka's Overview of Ajeya & Daniel discussion

Habryka's Overview of Ege & Ajeya/Daniel Discussion

The Dialogue

Visual probability distributions

Opening statements

Daniel

Ege

Ajeya

On in-context learning as a potential crux

Taking into account government slowdown

Recursive self-improvement and AI's speeding up R&D

Do we expect transformative AI pre-overhang or post-overhang?

Hofstadter's law in AGI forecasting

Summary of where we are at so far and exploring additional directions

Exploring conversational directions

Ege's median world

Far-off-distribution transfer

A concrete scenario & where its surprises are

Overall summary, takeaways and next steps