A lot of people I trust put relatively high confidence on near term AI timelines. For example, most people in the Lesswrong AI timelines thread from last year had shorter timelines than me, though it may have been because I interpreted the question a bit differently than most everyone else.
In this post, I'll cover three big reasons to expect long AI timelines, which I take to be the thesis that transformative AI-related phenomena won't happen for at least another 50 years (currently >2071). Roughly speaking, the three reasons are
- Technological deployment lag: Most technologies take decades between when they're first developed and when they become widely impactful.
- Overestimating the generality of AI technology: Many AI scientists in the 1950s and 1960s incorrectly expected that cracking computer chess would automatically crack other tasks as well. I suspect a similar phenomenon is happening in people's minds today when they extrapolate current AI.
- Regulation will slow things down: Lots of big technologies have been slowed down by regulation. Nuclear energy is an obvious example of a technology whose adoption has been hampered by regulation, but other cases exist too.
Technological deployment lag
From at least two perspectives, it makes sense to care more about when a technology is impactful, rather than when it first gets developed in a lab.
The first perspective is the perspective of an ordinary person. Ordinary people hardly get affected by isolated technological achievements. It might help their stock portfolios, if the resulting development triggers investors to become much more optimistic about future profits in those industries. But other than that, the ordinary person will care far more about when they can actually see the results of a technology compared to when it is first developed.
The second perspective is the perspective of a serious technological forecaster. These people care a lot about timing because the order of technological developments matters a lot for policy. To give a simple example, they care a lot about whether cheap and reliable solar energy will be developed before fusion power, because it tells them what type of technology society should invest in to stop climate change.
I care most about the second perspective, though it's worth noting cases where the first perspective might still matter. Consider a scenario in which AI researchers are going around declaring that "AGI is in 10 years." 10 years pass and an AGI developed is indeed developed in a lab somewhere, but with no noticeable impact on everyday life. People may grow distrustful of such proclamations, even if they're ultimately technically proven right.
While it might seem obvious to some that we should make a distinction between when a technology is developed and when it actually starts having a large impact, I'm pointing it out because I mostly don't get the impression from most AI forecasting literature that such a distinction is important.
Nearly all AI timeline surveys and forecasts I've been acquainted with simply take it as a starting assumption that what we care about is when advanced AI is developed, somewhere, rather than some side effect of that development. While I admit that it might be more reasonable to care about the specific moment in time when advanced AI is developed in a lab (particularly if we accept some local "foom" AI takeoff scenarios), it's not at all obvious to me that it is. If you disagree, I would prefer you to at least carefully outline your reasoning before spitting out a date.
One main reason why we might care most about the date of development is because we think that after sufficiently advanced AI is developed, the effects will happen almost instantaneously. The most extreme version of this thesis is the one where AI self-improves upon getting past some critical threshold, and takes over the whole world within a few weeks.
Most technologies, however, don't generally have immediate widespread effects. Our World in Data produced a great chart showing the typical timescales for technological adoption. It's worth checking out the whole article.
We can see from the chart that most technologies take decades between the time when just a few have access to it, and when they're ubiquitous. The chart likely underestimates the true lag between development and impact, however, because we also need to take into account the lag between when technologies are developed and when a non-negligible fraction of the population has access to them. Furthermore, this chart only shows adoption in the United States, a rich nation. Adoption trends worldwide are even slower.
(Consider that, at the time of writing, predictors on Metaculus expect transformative economic growth to come long after they expect the first AGI to be developed).
One objection is that we should care about AI's impacts long before it becomes ubiquitous in households—it might be adopted by businesses and governments first.
There are two forms this objection might take. The first form imagines that businesses or governments would be much faster to adopt technologies than households. I am uncertain about the strength of this objection, and I’m not sure what information might be relevant for answering it.
The second form of this objection is that AI might have huge transformative impacts even if only a few businesses or governments adopt it. The classic justification for this thesis is that one or a few AI projects become overwhelmingly powerful in a localized intelligence explosion, rather than having large effects by diffusion. In that case, all the standard arguments against AI foom are applicable here.
Another objection is that some technologies, especially smartphones, were adopted very rapidly compared to the other ones. AI is conceivably similar in this respect. The rapid adoption of smartphones seems to derive from at least one of two reasons: it could be that smartphones were unusually affordable for households, or that they experienced unusually high demand upon their introduction.
It's not clear to me whether AI technology will be unusually affordable relative to other technologies, and I lean towards doubting it. But it appears probable to me that AI will experience unusually high demand upon its introduction. Overall I'm not sure how to weight this consideration, but it definitely pushes me in the direction of thinking that AI technologies will probably not have a very long adoption timeline (say, more than 30 years after its introduction before it starts having large effects).
Another reason for doubting that AI will have immediate widespread impacts is because previous general purpose technologies failed to have such impacts too. Economist Robert Solow famously quipped in 1989 that "You can see the computer age everywhere but in the productivity statistics." His observation was later coined the Productivity Paradox.
By the late 1990s, labor productivity in the United States had finally accelerated, culminating in an economic boom. Economists have provided a few explanations for this lag. For instance, Wikipedia points out that we may have simply mismeasured growth by overestimating inflation. Philippe Aghion and Peter Howitt, however, outline an alternative and common-sense explanation in in chapter 9 of The Economics of Growth,
As David (1990) and Lipsey and Bekar (1995) have argued, GPTs [general purpose technologies] like the steam engine, the electric dynamo, the laser, and the computer require costly restructuring and adjustment to take place, and there is no reason to expect this process to proceed smoothly over time. Thus, contrary to the predictions of real-business-cycle theory, the initial effect of a “positive technology shock” may not be to raise output, productivity, and employment but to reduce them [...]
An alternative explanation for slowdowns has been developed by Helpman and Trajtenberg (1998a) using the Schumpeterian apparatus where R&D resources can alternatively be used in production. The basic idea of this model is that GPTs do not come ready to use off the shelf. Instead, each GPT requires an entirely new set of intermediate goods before it can be implemented. The discovery and development of these intermediate goods is a costly activity, and the economy must wait until some critical mass of intermediate components has been accumulated before it is profitable for firms to switch from the previous GPT. During the period between the discovery of a new GPT and its ultimate implementation, national income will fall as resources are taken out of production and put into R&D activities aimed at the discovery of new intermediate input components.
In line with these expecations, Daniel Kokotajlo pointed to this paper which complements this analysis by applying it to the current machine learning era.
Overestimating the generality of AI technology
Many very smart AI scientists in the 1950s and 1960s had once believed that human-level AI was imminent. As many later pointed out, these failed predictions by themselves provide evidence that AI will take longer to develop than we think. Yet, that's not the only reason why I'm bringing them up.
Instead, I want to focus on why AI scientists once believed that developing human-level AI would be relatively easy. The main reason, I suspect, is that researchers were too optimistic about the generality of their techniques. The case of computer chess is illustrative here. In his 1950 paper in which he provided an algorithm for perfect chess play, Claude Shannon wrote,
This paper is concerned with the problem of constructing a computing routine or "program" for a modern general purpose computer which will enable it to play chess. Although perhaps of no practical importance, the question is of theoretical interest, and it is hoped that a satisfactory solution of this problem will act as a wedge in attacking other problems of a similar nature and of greater significance.
Among the problems that Shannon had hoped would be attacked indirectly by solving chess, he listed,
Machines capable of translating from one language to another.
Machines capable of orchestrating a melody.
We now know that these problems are at least, for all practical purposes, only incidentally related to the problem of playing chess. At most, these problems are downright irrelevant. Few AI researchers would make the same mistake today.
Yet, I see elements of Shannon's mistake in the reasoning of many I see today. I’ll walk through my reasons.
First, consider why Shannon might have expected progress in chess to aid progress in language translation. We could imagine, in some abstract sense, that chess and language translation are both the same type of problems. Mathematically speaking, a chess engine is simply a mapping between chess board states and moves. Similarly, language translation is simply a mapping between sentences in one language, and sentences in another language.
Beyond the simple mathematical formalism, however, there are substantial real differences between the two tasks. While computer chess can feasibly be solved by brute force, language translation requires an extremely nuanced understanding of the rules native speakers use to compose their speech.
One reason why Shannon might not have given this argument much thought is because he wasn't thinking about how to do language translation in the moment; he was more interested in solving chess, and the other problems were afterthoughts.
We can view his stance from the analogy to construal level theory, or as Lesswrong likes to put it, near vs. far thinking. All of the concrete ways that chess could be tackled were readily apparent in Claude Shannon's mind, but the same could not be said about natural language translation. Rather than viewing a specific similarity between the two tasks, he could have made the forgivable mistake of assuming that a vague similarity between them was sufficient for his prediction.
It’s a bit like the planning fallacy. When planning our time, we can see all the ways things could go right and according to schedule, since those things are concrete. The ways that things could go wrong are more abstract, and thus occupy less space in our thinking. We mistake this perception for the likelihood of things going right.
Now let's compare this case to an argument I hear quite a lot these days. Consider the quite reasonable suggestion that GPT-3 is a rudimentary form of general intelligence. Given that it can write on a wide variety of topics, it certainly appears generally capable. Now consider one further assumption: the scaling hypothesis. We conclude that some descendant of GPT-3, given thousands or millions of times more computation, will naturally yield general AI.
I see no strong reason to doubt the narrow version of this thesis. I believe it's likely that, as training scales, we'll progressively see more general and more capable machine learning models that can do a ton of impressive things, both on the stuff we expect them to do well on, and some stuff we didn't expect.
But no matter how hard I try, I don't see any current way of making some descendant of GPT-3, for instance, manage a corporation.
One may reason that, as machine learning models scale and become more general, at some point this will just naturally yield the management skills required to run a company.
It's important to note that even if this were true, it wouldn't tell us much about how to extract those skills from the model. Indeed, GPT-3 may currently be skilled at many things that we nonetheless do not know how to make it actually perform.
Most importantly, notice the similarities between this reasoning and that of (my interpretation) of Claude Shannon's. Shannon expected algorithmic progress in chess to transfer usefully to other domains. In my interpretation, he did this because the problems of chess were near to him, and the problems of language translation were far from him.
Similarly, the problem "write a well-written essay" is close to us. We can see concretely how to get a model to perform better at it, and we are much impressed by what we obtain by making progress. "Manage a corporation" is far. We're not really sure how to approach it, even if we could point out vague similarities between the two problems if we tried.
I don't mean to imply that we haven't made progress on the task of getting an AI to manage a corporation. I only mean that you can't just wish it away as a hard problem simply by imagining that we'll just get it for free as a result of making steady progress on something simpler and more concrete.
What other tasks do I think people might be incorrectly assuming we could as a byproduct of progress on simpler things? Here's a partial list,
- As already stated, managing organizations and people.
- Complex general purpose robotics, of the type needed to win the RoboCup grand challenge.
- Long-term planning and execution, especially involving fine motor control and no guarantees about how the environment will be structured.
- Making original and profound scientific discoveries.
I won't claim that an AI can't be dangerous to people if it lacks these abilities. However, I do think that in order to pose an existential risk to humanity, or obtain a decisive strategic advantage over humans, AI systems would likely need to be capable enough to do at least one of these things.
Regulation will slow things down
Recently, Jason Crawford wrote on Roots of Progress,
In the 1950s, nuclear was the energy of the future. Two generations later, it provides only about 10% of world electricity, and reactor design hasn't fundamentally changed in decades.
As Crawford explains, the reason for this slow adoption is neither because nuclear plants are unsafe or because they can't be built cheaply. Rather, burdensome regulation has raised production costs to a level where people would rather pay for other energy sources,
Excessive concern about low levels of radiation led to a regulatory standard known as ALARA: As Low As Reasonably Achievable. What defines “reasonable”? It is an ever-tightening standard. As long as the costs of nuclear plant construction and operation are in the ballpark of other modes of power, then they are reasonable.
This might seem like a sensible approach, until you realize that it eliminates, by definition, any chance for nuclear power to be cheaper than its competition. Nuclear can‘t even innovate its way out of this predicament: under ALARA, any technology, any operational improvement, anything that reduces costs, simply gives the regulator more room and more excuse to push for more stringent safety requirements, until the cost once again rises to make nuclear just a bit more expensive than everything else. Actually, it‘s worse than that: it essentially says that if nuclear becomes cheap, then the regulators have not done their job.
Crawford lays blame on the incentives of regulators. As he put it,
[The regulators] get no credit for approving new plants. But they do own any problems. For the regulator, there‘s no upside, only downside. No wonder they delay.
In fact, these perverse incentives facing regulators have long been known by economists who favor deregulation. Writing in 1980, Milton and Rose Friedman gave the following argument in the context of the FDA regulation,
It is no accident that the FDA, despite the best of intentions, operates to discourage the development and prevent the marketing of new and potentially useful drugs. Put yourself in the position of an FDA official charged with approving or disapproving a new drug. You can make two very different mistakes:
1. Approve a drug that turns out to have unanticipated side effects resulting in the death or serious impairment of a sizable number of persons.
2. Refuse approval of a drug that is capable of saving many lives or relieving great distress and that has no untoward side effects.
If you make the first mistake—approve a thalidomide—your name will be spread over the front page of every newspaper. You will be in deep disgrace. If you make the second mistake, who will know it?
Given the moral case here, it might come as a surprise that the effect of regulation on technological innovation has not generally been well studied. Philippe Aghion et al. recently published a paper saying as much in their introduction. Still, although we lack a large literature to show the role regulation plays to delay technological development, it almost certainly does.
Regulation is arguably the main thing standing in the way of lots of futuristic technologies: human cloning, human genetic engineering, and climate engineering come to mind, just to name a few.
One might think that the AI industry is immune to such regulation, or nearly so. After all, the tech industry has historically experienced a lot of growth without much government interference. What reason is there for this to stop?
I offer two replies. The first reason is that governments of the world are already on the cusp of a concerted effort to regulate technology companies. A New York Times article from April 20th explains,
China fined the internet giant Alibaba a record $2.8 billion this month for anticompetitive practices, ordered an overhaul of its sister financial company and warned other technology firms to obey Beijing’s rules.
Now the European Commission plans to unveil far-reaching regulations to limit technologies powered by artificial intelligence.
And in the United States, President Biden has stacked his administration with trustbusters who have taken aim at Amazon, Facebook and Google.
Around the world, governments are moving simultaneously to limit the power of tech companies with an urgency and breadth that no single industry had experienced before. Their motivation varies. In the United States and Europe, it is concern that tech companies are stifling competition, spreading misinformation and eroding privacy; in Russia and elsewhere, it is to silence protest movements and tighten political control; in China, it is some of both.
The second reason is that, as AI becomes more capable, we'll likely increasingly see calls for it to be regulated. I should point out that I’m not restricting my analysis to government regulation; the very fact that the AI safety community exists, and that OpenAI and Deepmind hired people to work on safety, provides evidence that such calls for more caution will occur.
The slightest sign of danger was enough to stall nuclear energy development. I don’t see much reason to expect any different for AI.
Furthermore, many others and I, have previously pointed out that in a continuous AI takeoff scenario, low-magnitude AI failures will happen before large-magnitude failures. It seems plausible to me that at some point, a significant AI failure will happen that triggers a national or even international panic, despite not posing any sort of imminent existential risk. In other words, I pretty much expect a Chernobyl disaster of AI—or at least, I expect a series of such disasters to happen that will have more or less the same effect.
Combining all three of these effects, it's a bit difficult to see how we will get transformative AI developments in the next 50 years [edit: but not very difficult, see my update here]. Even accepting some of the more optimistic assumptions in e.g. Ajeya Cotra's Draft report on AI timelines, it still seems to me that these effects will add a few decades to our timelines before things get really interesting. So at present, my optimistic timelines look more like 25 or 30 years, rather than 10 or 15. But of course, smart people disagree with me here, there's a ton of uncertainty, so I'm happy to find where I made mistakes.
Thanks for this post! I'll write a fuller response later, but for now I'll say: These arguments prove too much; you could apply them to pretty much any technology (e.g. self-driving cars, 3D printing, reusable rockets, smart phones, VR headsets...). There doesn't seem to be any justification for the 50-year number; it's not like you'd give the same number for those other techs, and you could have made exactly this argument about AI 40 years ago, which would lead to 10-year timelines now. You are just pointing out three reasons in favor of longer timelines and then concluding
Which seems unwarranted to me. I agree that the things you say push in the direction of longer timelines, but there are other arguments one could make that push in the direction of shorter timelines, and it's not like your arguments are so solid that we can just conclude directly from them that timelines are long--and specifically 50+ years long!
I suppose my argument has an implicit, "current forecasts are not taking these arguments into account." If people actually were taking my arguments into account, and still concluding that we should have short timelines, then this would make sense. But, I made these arguments because I haven't seen people talk about these considerations much. For example, I deliberately avoided the argument that according to the outside view, timelines might be expected to be long, since that's an argument I've already seen many people make, and therefore we can expect a lot of people to take it into account when they make forecasts.
Sure. I think my post is akin to someone arguing for a scientific theory. I'm just contributing some evidence in favor of the theory, not conducting a full analysis for and against it. Others can point to evidence against it, and overall we'll just have to sum over all these considerations to arrive at our answer.
I definitely agree that our timelines forecasts should take into account the three phenomena you mention, and I also agree that e.g. Ajeya's doesn't talk about this much. I disagree that the effect size of these phenomena is enough to get us to 50 years rather than, say, +5 years to whatever our opinion sans these phenomena was. I also disagree that overall Ajeya's model is an underestimate of timelines, because while indeed the phenomena you mention should cause us to shade timelines upward, there is a long list of other phenomena I could mention which should cause us to shade timelines downward, and it's unclear which list is overall more powerful.
On a separate note, would you be interested in a call sometime to discuss timelines? I'd love to share my overall argument with you and hear your thoughts, and I'd love to hear your overall timelines model if you have one.
I don't think technological deployment is likely to take that long for AI's. With a physical device like a car or fridge, it takes time for people to set up the factories, and manufacture the devices. AI can be sent across the internet in moments. I don't know how long it takes google to go from say an algorithm that detects streets in satellite images to the results showing up in google maps, but its not anything like the decades it took those physical techs to roll out.
The slow roll-out scenario looks like this, AGI is developed using a technique that fundamentally relies on imitating humans, and requires lots of training data. There aren't nearly enough data from humans that are AI experts to make an AI AI expert. The AI is about as good at AI research as the median human. Or maybe the 80th percentile human. Ie no good at all. The AI design fundamentally requires custom hardware to run at reasonable speeds. Add in some political squabbling and it could take a fair few years before wide use, although there would still be huge economic incentive to create it.
The fast scenario is the rapidly self improving superintelligence. Where we have oodles of compute by the time we crack the algorithms. All the self improvement happens very fast in software. Then the AI takes over the world. (I question that "a few weeks" is the fastest possible timescale for this. )
(For that matter, the curves on the right of the graph look steeper. It takes less time for an invention to be rolled out nowadays)
For your second point, you can name biases that might make people underestimate timelines, I can name biases that might make people overestimate timelines. (eg Failure to consider techniques not known to you) And it all turns into a bias naming competition. Which is hardly truth tracking at all.
As for regulation, I think its what people are doing in R&D labs, not what is rolled out that matters. And that is harder to regulate. I also explicitly don't expect any AI Chernobyl. I don't strongly predict there won't be an AI Chernobyl either. I feel that if the relevant parties act with the barest modicum of competence, there won't be an AI Chernobyl. And the people being massively stupid will carry on being massively stupid after any AI Chernobyl.
Thanks for the nice post! Here's why I disagree :)
Normal technologies require (1) people who know how to use the technology, and (2) people who decide to use the technology. If we're thinking about a "real-deal AGI" that can do pretty much every aspect of a human job but better and cheaper, then (1) isn't an issue because the AGI can jump into existing human roles. It would be less like "technology deployment" and more like a highly-educated exquisitely-skilled immigrant arriving into a labor market. Such a person would have no trouble getting a job, in any of a million different roles, in weeks not decades. For (2), the same "real-deal AGI" would be able to start companies of its own accord, build factories, market products and services, make money, invest it in starting more companies, etc. etc. So it doesn't need anyone to "decide to use the technology" or to invest in the technology.
I think my main disagreement comes from my thinking of AGI development as being "mostly writing and testing code inside R&D departments", rather than "mostly deploying code to the public and learning from that experience". I agree that it's feasible and likely for the latter activity to get slowed down by regulation, but the former seems much harder to regulate for both political reasons and technical reasons.
The political is: It's easy to get politicians riled up about the algorithms that Facebook is actually using to influence people, and much harder to get politicians riled up about whatever algorithms Facebook is tinkering with (but not actually deploying) in some office building somewhere. I think there would only be political will once we start getting "lab escape accidents" with out-of-control AGIs self-replicating around the internet, or whatever, at which point it may well be too late already.
The technical is: A lot of this development will involve things like open-source frameworks to easily parallelize software, and easier-to-use faster open-source implementations of new algorithms, academic groups publishing papers, and so on. I don't see any precedent or feasible path for the regulation of these kinds of activities, even if there were the political will.
Not that we shouldn't develop political and technical methods to regulate that kind of thing—it seems like worth trying to figure out—just that it seems extremely hard to do and unlikely to happen.
My own inside-view story (see here for example) is that human intelligence is based around a legible learning algorithm, and that researchers in neuroscience and AI are making good progress in working out exactly how that learning algorithm works, especially in the past 5 years. I'm not going to try to sell you on that story here, but fwiw it's a short-ish timelines story that doesn't directly rely on the belief that currently-popular deep learning models are very general, or even necessarily on the right track.
I broadly agree with these points, and (1) and (3) in particular lead to me to shade the bio anchors estimates upwards by ~5 years (note they are already shaded up somewhat to account for these kinds of effects).
I don't really agree on (2).
I feel like if you were applying this argument to evolution, you'd conclude that humans would be unable to manage corporations, which seems too much. Humans seem to do things that weren't in the ancestral environment, why not GPTs, for the same reason?
You might say "okay, sure, at some level of scaling GPTs learn enough general reasoning that they can manage a corporation, but there's no reason to believe it's near". But one of the major points of the bio anchors framework is to give a reasonable answer to the question of "at what level of scaling might this work", so I don't think you can argue that current forecasts are ignoring (2).
Perhaps you just mean that most people aren't taking bio anchors into account and that's why (2) applies to them -- that seems plausible, I don't have strong beliefs about what other people are thinking.
Thanks for the useful comment.
Right. This is essentially the same way we might reply to Claude Shannon if he said that some level of brute-force search would solve the problem of natural language translation.
Figuring out how to make a model manage a corporation involves a lot more than scaling a model until it has the requisite general intelligence to do it in principle if its motivation were aligned.
I think it will be hard to figure out how to actually make models do stuff we want. Insofar as this is simply a restatement of the alignment problem, I think this assumption will be fairly uncontroversial around here. Yet, it's also a reason to assume that we won't simply obtain transformative models the moment they become theoretically attainable.
It might seem unfair that I'm inputting safety and control as an input in our model for timelines, if we're using the model to reason about the optimal time to intervene. But I think on an individual level it makes sense to just try to forecast what will actually happen.
Fwiw, the problem I think is hard is "how to make models do stuff that is actually what we want, rather than only seeming like what we want, or only initially what we want until the model does something completely different like taking over the world".
I don't expect that it will be hard to get models that look like they're doing roughly the thing we want; see e.g. the relative ease of prompt engineering or learning from human preferences. If I thought that were hard, I would agree with you.
I would guess that this is relatively uncontroversial as a view within this field? Not sure though.
(One of my initial critiques of bio anchors was that it didn't take into account the cost of human feedback, except then I actually ran some back-of-the-envelope calculations and it turned out it was dwarfed by the cost of compute; maybe that's your crux too?)
Sorry for replying to this comment 2 years late, but I wanted to discuss this part of your reasoning,
I think that's what I meant when I said "I think it will be hard to figure out how to actually make models do stuff we want". But more importantly, I think that's how most people will in fact perceive what it means to get a model to "do what we want".
Put another way, I don't think people will actually start using AI CEOs just because we have a language model that acts like a CEO. Large corporations will likely wait until they're very confident in its reliability, robustness, and alignment. (Although idk, maybe some eccentric investors will find the idea interesting, I just expect that most people will be highly skeptical without strong evidence that it's actually better than a human.)
I think this point can be seen pretty easily in discussion of driverless cars. Regulators are quite skeptical of Tesla's autopilot despite it seeming to do what we want in perhaps over 99% of situations.
If anything, I expect most people to be intuitively skeptical that AI is really "doing what we want" even in cases where it's genuinely doing a better job than humans, and doesn't merely appear that way on the surface. The reason is simple: we have vast amounts of informal data on the reliability of humans, but very little idea how reliable AI will be. That plausibly causes people to start with a skeptical outlook, and only accept AI in safety-critical domains when they've seen it accumulate a long track record of exceptional performance.
For these reasons, I don't fully agree that "one of the major points of the bio anchors framework is to give a reasonable answer to the question of "at what level of scaling might this work"". I mean, I agree that this was what the report was trying to answer, but I disagree that it answered the question of when we will accept and adopt AI for various crucial economic activities, even if such systems were capable of automating everything in principle.
I want to distinguish between two questions:
(The key difference being that (1) is a statement about people's beliefs about reality, while (2) is a statement about reality directly.)
(For all of this I'm assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn counts as "performing the CEO task well".)
I'm very sympathetic to skepticism about question 1 on short timelines, and indeed as I mentioned I agree with your points (1) and (3) in the OP and they cause me to lengthen my timelines for TAI relative to bio anchors.
My understanding was that you are also skeptical about question 2 on short timelines, and that was what you were arguing with your point (2) on overestimating generality. That's the part I disagree with. But your response is talking about things that other people will believe, rather than about reality; I already agree with you on that part.
I think I understand my confusion, at least a bit better than before. Here's how I'd summarize what happened.
I had three arguments in this essay, which I thought of as roughly having the following form:
You said that (2) was already answered by the bio anchors model. I responded that bio anchors neglected how difficult it will be to develop AI safely. You replied that it will be easy make models to seemingly do what we want, but that the harder part will be making models that actually do what we want.
My reply was trying to say that the inherent difficulty of building TAI safely was inherently baked into (2) already. That might be a dubious reading of the actual textual argument for (2), but I think that interpretation is backed up by my initial reply to your comment.
The reason why I framed my later reply as being about perceptions was because I think the requisite capability level at which people begin to adopt TAI is an important point about how long timelines will be independent of (1) and (3). In other words, I was arguing that people's perceptions of the capability of AI will cause them wait to adopt AI until it's fully developed in the sense I described above; it won't just delay the effects of TAI after it's fully developed, or before then because of regulation.
Furthermore, I assumed that you were arguing something along the lines of "people will adopt AI once it's capable of only seeming to do what we want", which I'm skeptical of. Hence my reply to you.
Since for point 2 you said "I'm assuming that an AI CEO that does the job of CEO well until the point that it executes a treacherous turn", I am not very skeptical of that right now. I think we could probably have AIs do something that looks very similar to what a CEO would do within, idk, maybe five years.
(Independently of all of this, I've updated towards medium rather than long timelines in the last two years, but mostly because of reflection on other questions, and because I was surprised by the rate of recent progress, rather than because I have fundamental doubts about the arguments I made here, especially (3), which I think is still underrated.
ETA: though also, if I wrote this essay today I would likely fully re-write section (2), since after re-reading it I now don't agree with some of the things I said in it. Sorry if I was being misleading by downplaying how poor some of those points were.)
My summary of your argument now would be:
If that's right, I broadly agree with all of these points :)
(I previously thought you were saying something very different with (2), since the text in the OP seems pretty different.)
FWIW I don't think you're getting things wrong here. I also have simply changed some of my views in the meantime.
That said, I think what I was trying to accomplish with (2) was not that alignment would be hard per se, but that it would be hard to get an AI to do very high-skill tasks in general, which included aligning the model, since otherwise it's not really "doing the task" (though as I said, I don't currently stand by what I wrote in the OP, as-is).
Planned summary for the Alignment Newsletter: