27

AI TimelinesSurveysAI
Frontpage

This post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views. A draft was sent to Robin Hanson for review but received no response.

Summary

• Robin Hanson estimates the time until human-level AI by surveying experts about the percentage progress to human-level that has happened in their particular subfield in the last 20 years, and dividing the number of years by the percentage progress.
• Such surveys look back on a period of extremely rapid growth of compute from both hardware improvements and more recently skyrocketing spending.
• Hanson favors using estimates from subsets of researchers with lower progress estimates to infer AI timelines requiring centuries worth of recent growth, implying truly extraordinary sustained compute growth is necessary to surpass human performance.
• Extrapolated compute levels are very large to astronomically large compared to the neural computation that took place in evolution on Earth, and thus likely far overestimate AI requirements and timelines.

• AI Impacts estimates that, from 2011 to 2017, $/FLOPS fell by 10x every 10-16 years, around 15% to 25% a year. • Bloom et al. estimate that semiconductor R&D efforts have grown by a factor of 18 from 1971 to 2020, around 6.8% a year. • Amodei and Hernandez estimate that, from 2012 to 2018, the amount of compute used in the largest AI training runs increased 10x every 11.2 months, around 1180% a year. • AI Index Report estimates that global corporate investment in AI was$68 billion in 2020, up from $13 billion in 2015, an average increase of 39% a year. • The World Bank estimates that world GDP has grown an average of 3.5% a year from 1961 to 2019. • Besiroglu estimates that, from 2012 to 2020, the effective number of researchers in ML rose by 10x every 4-8 years, around 33% to 78% a year. Extrapolating past input growth yields ludicrously high estimates of the resource requirements for human level AI performance Hanson’s survey was conducted in 2012, so we must use trends from the two decades prior when extrapolating. We conservatively ignore the major historical growth in spending on AI compute as a fraction of the economy, and especially the surge in investment in large models that drove Amodei and Hernandez’s finding of annual 10x growth in compute used in the largest deep learning models over several years. Accounting for that would yield even more extreme estimates in the extrapolation. Extrapolating world GDP’s historical 3.5% growth for 372 years yields a 10^5x increase in the amount of money spent on the largest training run. Extrapolating the historic 10x fall in$/FLOP every 7.7 years for 372 years yields a 10^48x increase in the amount of compute that can be purchased for that much money (we recognize that this extrapolation goes past physical limits). Together, 372 years of world GDP growth and Moore’s law yields a 10^53x increase in the amount of available compute for a single training run. Assuming GPT-3 represents the current frontier at 10^23 floating point operations (FLOP), multiplying suggests that 10^76 FLOP of compute will be available for the largest training run in 2393.

10^76 FLOP is vast relative to the evolutionary process that produce animal and human intelligence on Earth, and ludicrous overkill for existing machine learning methods to train models vastly larger than human brains:[1]

• Cotra (2020) estimates the total amount of computation done in animal nervous systems over the course of our evolution was 10^41 FLOP. 10^76 FLOP is enough to run evolution almost a trillion trillion trillion times.
• One illustration of this is that for non-reversible computers, the thermodynamic Landauer limit means that 10^76 FLOP would require vastly more energy than has been captured by all living things in the history of Earth. Landauer's principle requires that non-reversible computers at 310 Kelvin need more than 3 x 10^-21 J for each bit-erasure. Carlsmith (2020) tentatively suggests ~1 bit-erasure per FLOP, suggesting 10^51 J are needed to perform 10^76 FLOP. About 10^17 J/s of sunlight strikes the earth, so 10^34 s ~ 10^20 times the age of the universe of 100% efficient maximum-coverage terrestrial solar energy is needed to power 10^76 FLOP of irreversible computing.
• Carlsmith (2020) estimates “it [is] more likely than not that 10^15 FLOP/s is enough to perform tasks as well as the human brain”. 10^76 FLOP is enough to sustain a 10^61:1 training compute:inference compute ratio.
• Roughly approximating Cotra (2020)’s estimates, models can be trained with one datapoint per parameter and each parameter requires 10 inference FLOPs, suggesting 10^76 FLOP is enough to train a model with 10^25 parameters.
• Extrapolating scaling laws in model performance yields enormous improvements, and historically tasks with previously flat or zero performance have yielded as models became capable enough to solve the problem at all.

Some of Hanson’s writing suggests he is indeed endorsing these sorts of requirements. E.g. in one post he writes:

For example, at past rates of [usual artificial intelligence (UAI)] progress it should take two to four centuries to reach human level abilities in the typical UAI subfield, and thus even longer in most subfields. Since the world economy now doubles roughly every fifteen years, that comes to twenty doublings in three centuries. If ems show up halfway from now to full human level usual AI, there’d still be ten economic doublings to go, which would then take ten months if the economy doubled monthly. Which is definitely faster UAI progress...Thus we should expect many doublings of the em era after ems and before human level UAI

This seems to be saying that the timeline estimates he is using are indeed based on input growth rather than serial time, and so AGI requires multiplying log input growth (referred to as doublings above) of the 20 year past periods many fold.

Conclusion

So on the object-level we can reject such estimates of the resource requirements for human-level performance. If extrapolating survey responses about fractional progress per Hanson yields such absurdities, we should instead believe that respondents’ subjective progress estimates will accelerate as a function of resource inputs. In particular, that they would be end-loaded, putting higher weight on final orders of magnitude in input growth. This position is supported by the enormous differences in cognitive capabilities between humans and chimpanzees despite less than an order of magnitude difference in the quantity of brain tissue. It seems likely that ‘chimpanzee AI’ would be rated as a lot less than 90% progress towards human level performance, but chimpanzees only appeared after 99%+ of the timespan of evolution, and 90%+ of the growth in brain size.

This view also reconciles better with the survey evidence reporting acceleration and direct timeline estimates much shorter than Hanson’s.

For the more recent survey estimate of 142 years of progress with the above assumptions the result is 10^43 FLOP (more with incorporation of past growth in spending), which is more arguable but still extremely high, suggesting evolutionary levels of compute without any of the obvious advantages of intentional human design providing major efficiencies, and with scaling much worse than observed for deep learning today.

The aggregate and deep learning linear extrapolation results of several decades still suffer a version of this problem, primarily because of the unsustainably rapid growth of expenditures, e.g. it is impossible to maintain annual 10x growth in compute for the largest models for 30 years. While they report much more rapid progress and acceleration than the Hanson survey respondents, we would still expect more acceleration in subjective progress estimates as we come closer to across-the-board superhuman performance.

1. We don’t think that anthropic distortions conceal large amounts of additional difficulty because of evolutionary timings and convergent evolution, combined with already existing computer hardware and software’s demonstrated capabilities. See Shulman and Bostrom (2012) for more details. ↩︎

27

New Comment

Planned summary:

One [methodology](https://www.overcomingbias.com/2012/08/ai-progress-estimate.html) for forecasting AI timelines is to ask experts how much progress they have made to human-level AI within their subfield over the last T years. You can then extrapolate linearly to see when 100% of the problem will be solved. The post linked above collects such estimates, with a typical estimate being 5% of a problem being solved in the twenty year period between 1992 and 2012. Overall these estimates imply a timeline of [372 years](https://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/).

This post provides a reductio argument against this pair of methodology and estimate. The core argument is that if you linearly extrapolate, then you are effectively saying “assume that business continues as usual: then how long does it take”? But “business as usual” in the case of the last 20 years involves an increase in the amount of compute used by AI researchers by a factor of ~1000, so this effectively says that we’ll get to human-level AI after a 1000^{372/20} = 10^56 increase in the amount of available compute. (The authors do a somewhat more careful calculation that breaks apart improvements in price and growth of GDP, and get 10^53.)

This is a stupendously large amount of compute: it far dwarfs the amount of compute used by evolution, and even dwarfs the maximum amount of irreversible computing we could have done with all the energy that has ever hit the Earth over its lifetime (the bound comes from [Landauer’s principle](https://en.wikipedia.org/wiki/Landauer%27s_principle)).

Given that evolution _did_ produce intelligence (us), we should reject the argument. But what should we make of the expert estimates then? One interpretation is that “proportion of the problem solved” behaves more like an exponential, because the inputs are growing exponentially, and so the time taken to do the last 90% can be much less than 9x the time taken for the first 10%.

Planned opinion:

This seems like a pretty clear reductio to me, though it is possible to argue that this argument doesn’t apply because compute isn’t the bottleneck, i.e. even with infinite compute we wouldn’t know how to make AGI. (That being said, I mostly do think we could build AGI if only we had enough compute; see also <@last week’s highlight on the scaling hypothesis@>(@The Scaling Hypothesis@).)

"Overall these estimates imply a timeline of [372 years](https://aiimpacts.org/surveys-on-fractional-progress-towards-hlai/)."

That was only for Hanson's convenience sample, other surveys using the method gave much shorter timelines, as discussed in the post.

Ah, fair point, looking back at this summary I probably should have clarified that the methodology could be applied with other samples and those look much less long.

FWIW, Hanson has elsewhere promoted the idea that algorithmic progress is primarily due to hardware progress. Relevant passage:

Maybe there are always lots of decent ideas for better algorithms, but most are hard to explore because of limited computer hardware. As hardware gets better, more new ideas can be explored, and some of them turn out to improve on the prior best algorithms. This story seems to at least roughly fit what I’ve heard about the process of algorithm design.

So he presumably would endorse the claim that HLMI will likely requires several tens of OOM more compute than we currently have, but that a plateauing in other inputs (such as AI researchers) won't be as relevant. (Here's also another post of Hanson where he endorses a somewhat related claim that we should expect exponential increases in hardware to translate to ~linear social impact and rate of automation.)