All of abergal's Comments + Replies

Extrapolating GPT-N performance

Planned summary for the Alignment Newsletter:

This post describes the author’s insights from extrapolating the performance of GPT on the benchmarks presented in the <@GPT-3 paper@>(@Language Models are Few-Shot Learners@). The author compares cross-entropy loss (which measures how good a model is at predicting the next token) with benchmark performance normalized to the difference between random performance and the maximum possible performance. Since <@previous work@>(@Scaling Laws for Neural Language Models@) has shown that cross-entropy loss s

... (read more)
2020 AI Alignment Literature Review and Charity Comparison

AI Impacts now has a 2020 review page so it's easier to tell what we've done this year-- this should be more complete / representative than the posts listed above. (I appreciate how annoying the continuously updating wiki model is.)

1Larks6moThanks, added.
Draft report on AI timelines

From Part 4 of the report:

Nonetheless, this cursory examination makes me believe that it’s fairly unlikely that my current estimates are off by several orders of magnitude. If the amount of computation required to train a transformative model were (say) ~10 OOM larger than my estimates, that would imply that current ML models should be nowhere near the abilities of even small insects such as fruit flies (whose brains are 100 times smaller than bee brains). On the other hand, if the amount of computation required to train a transformative model were
... (read more)
1Ajeya Cotra9moYes, it's assuming the scaling behavior follows the probability distributions laid out in Part 2, and then asking whether conditional on that the model size requirements could be off by a large amount.
Draft report on AI timelines

So exciting that this is finally out!!!

I haven't gotten a chance to play with the models yet, but thought it might be worth noting the ways I would change the inputs (though I haven't thought about it very carefully):

  • I think I have a lot more uncertainty about neural net inference FLOP/s vs. brain FLOP/s, especially given that the brain is significantly more interconnected than the average 2020 neural net-- probably closer to 3 - 5 OOM standard deviation.
  • I think I also have a bunch of uncertainty about algorithmic efficiency progress-- I could im
... (read more)
1Ajeya Cotra9moThanks! I definitely agree that the proper modeling technique would involve introducing uncertainty on algorithmic progress, and that this uncertainty would be pretty wide; this is one of the most important few directions of future research (the others being better understanding effective horizon length and better narrowing model size). In terms of uncertainty in model size, I personally find it somewhat easier to think about what the final spread should be in the training FLOP requirements distribution, since there's a fair amount of arbitrariness in how the uncertainty is apportioned between model size and scaling behavior. There's also semantic uncertainty about what it means to "condition on the hypothesis that X is the best anchor." If we're living in the world of "brain FLOP/s anchor + normal scaling behavior", then assigning a lot of weight to really small model sizes would wind up "in the territory" of the Lifetime Anchor hypothesis, and assigning a lot of weight to really large model sizes would wind up "in the territory" of the Evolution Anchor hypothesis, or go beyond the Evolution Anchor hypothesis. I was roughly aiming for +- 5 OOM uncertainty in training FLOP requirements on top of the anchor distribution, and then apportioned uncertainty between model size and scaling behavior based on which one seemed more uncertain.
OpenAI announces GPT-3

I'm a bit confused about this as a piece of evidence-- naively, it seems to me like not carrying the 1 would be a mistake that you would make if you had memorized the pattern for single-digit arithmetic and were just repeating it across the number. I'm not sure if this counts as "memorizing a table" or not.

1Daniel Kokotajlo1yExcellent point! Well, they do get the answer right some of the time... it would be interesting to see how often they "remember" to carry the one vs. how often they "forget." It looks like the biggest model got basically 100% correct on 2-digit addition, so it seems that they mostly "remember."