As I mentioned on Twitter, it's amazing that they wrote an entire paper trying to estimate performance scaling with compute, and ignored what looks like the entire literature doing actual controlled highly-precise experiments on scaling up fixed architectures (no citations to any of them that I could see) in favor of grabbing random datapoints from the overall literature.

Why should anyone pay any attention to their estimates, which are so unreliable and vague? Why would you do that and ignore (mini literature review follows): Sun et al 2017, Hestness et al 2017, Shallue et al 2018, McCandlish et al 2018, Rosenfeld et al 2019, Li et al 2020, Kaplan et al 2020, Roller et al 2020, Chen et al 2020a, Chen et al 2020b
, Lepikhin et al 2020, and Huggingface 2020?

And the rest is not much better, like the de rigeur 'green' CO2 estimates (as if training DL actually emitted CO2, as if 'green' approaches aren't just doomed from the start as the most efficient NNs always start from research on the very large models they would like to rule out, as if large high-performance NNs aren't used in the real world in any way and do not replace even more CO2-intensive systems like say humans, as if CO2 costs are even the most important cost to begin with...). This isn't a paper that needs any extensive critique, let us say.

[-]weverka3y00

Gwern asks"Why would you do that and ignore (mini literature review follows):"

Thompson did not ignore the papers Gwern cites. A number of them are in Thompson's tables comparing prior work on scaling. Did Gwern tweet this criticism without even reading Thompson's paper?

[-]gwern3y*30

I did read it, and he did ignore them. Do you really think I criticized a paper publicly in harsh terms for not citing 12 different papers without even checking the bibliography or C-fing the titles/authors? Please look at the first 2020 paper version I was criticizing in 16 July 2020, when I wrote that comment, and don't lazily misread the version posted 2 years later on 27 July 2022 which, not being a time traveler, I obviously could not have read or have been referring to (and which may well have included those refs because of my comments there & elsewhere).

(Not that I am impressed by their round 2 stuff which they tacked on - but at least now they acknowledge that prior scaling research exists and try to defend their very different approach at all.)

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

4

[Preprint] The Computational Limits of Deep Learning

4