[Preprint] The Computational Limits of Deep Learning

by G Gordon Worley III1 min read21st Jul 20201 comment

4

AI
Frontpage
This is a linkpost for https://arxiv.org/abs/2007.05558

"The Computational Limits of Deep Learning" by Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso

Links:

NB: This is a preprint and not peer-reviewed or accepted for publication as best I can tell, so more than usual you'll have to make your own judgements about the quality of the results.

Abstract:

Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image recognition, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article reports on the computational demands of Deep Learning applications in five prominent application areas and shows that progress in all five is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.

A few additional details: they look at papers in ML to see how much compute was required to get results, and extrapolate the trend lines to suggest we're nearing the limits of what is economically feasible to do under the current regime. They believe this implies we'll have to get more efficient if we want to see continued progress, such as by having more specialized and efficient hardware or by improving algorithms. My takeaway is that they believe most of the low hanging fruit in ML gains has already been picked, and additional gains in capabilities will not come as easily as past gains.

The straightforward implications for safety are that, if this is true, we are less near x-risk territory than it might appear we are if you were to only look at the "numerator" of the trend lines (what we can do) without consider the "denominator" of them (how much it costs). Not that we are necessary dramatically far from x-risk territory with ML, mind you, only that it's not obviously very near term since the economic realities of deploying this technology will soon shift to naturally slow immediate progress without significant effort or innovation.

AI2
Frontpage

4

1 comments, sorted by Highlighting new comments since Today at 4:11 PM
New Comment

I believe Gwern had some harsh words for this paper. (See below) I'd be interested to see a response from fans of the paper.

As I mentioned on Twitter, it's amazing that they wrote an entire paper trying to estimate performance scaling with compute, and ignored what looks like the entire literature doing actual controlled highly-precise experiments on scaling up fixed architectures (no citations to any of them that I could see) in favor of grabbing random datapoints from the overall literature.
Why should anyone pay any attention to their estimates, which are so unreliable and vague? Why would you do that and ignore (mini literature review follows): Sun et al 2017, Hestness et al 2017, Shallue et al 2018, McCandlish et al 2018, Rosenfeld et al 2019, Li et al 2020, Kaplan et al 2020, Roller et al 2020, Chen et al 2020a, Chen et al 2020b
, Lepikhin et al 2020, and Huggingface 2020?
And the rest is not much better, like the de rigeur 'green' CO2 estimates (as if training DL actually emitted CO2, as if 'green' approaches aren't just doomed from the start as the most efficient NNs always start from research on the very large models they would like to rule out, as if large high-performance NNs aren't used in the real world in any way and do not replace even more CO2-intensive systems like say humans, as if CO2 costs are even the most important cost to begin with...). This isn't a paper that needs any extensive critique, let us say.