Planned summary for the Alignment Newsletter:
This post describes the author’s insights from extrapolating the performance of GPT on the benchmarks presented in the <@GPT-3 paper@>(@Language Models are Few-Shot Learners@). The author compares cross-entropy loss (which measures how good a model is at predicting the next token) with benchmark performance normalized to the difference between random performance and the maximum possible performance. Since <@previous work@>(@Scaling Laws for Neural Language Models@) has shown that cross-entropy loss s
AI Impacts now has a 2020 review page so it's easier to tell what we've done this year-- this should be more complete / representative than the posts listed above. (I appreciate how annoying the continuously updating wiki model is.)
From Part 4 of the report:
Nonetheless, this cursory examination makes me believe that it’s fairly unlikely that my current estimates are off by several orders of magnitude. If the amount of computation required to train a transformative model were (say) ~10 OOM larger than my estimates, that would imply that current ML models should be nowhere near the abilities of even small insects such as fruit flies (whose brains are 100 times smaller than bee brains). On the other hand, if the amount of computation required to train a transformative model were
So exciting that this is finally out!!!
I haven't gotten a chance to play with the models yet, but thought it might be worth noting the ways I would change the inputs (though I haven't thought about it very carefully):
I'm a bit confused about this as a piece of evidence-- naively, it seems to me like not carrying the 1 would be a mistake that you would make if you had memorized the pattern for single-digit arithmetic and were just repeating it across the number. I'm not sure if this counts as "memorizing a table" or not.