Hi John, I think I remember that presentation -- the reason the graph there was quite bimodal is because the Lifetime Anchor I was using at the time was simply assuming ~1x human lifetime levels of computation. In the current model, I'm assuming ~1000x human lifetime levels of computation, because ~1x seemed like a much less likely version of that anchor. The code in the quantitative model will let you see the untruncated version of the distribution, and it looks a lot more smooth now, though still a modest bump.
Also, apologies for such a late reply, I don't get email notifications for comments and haven't been checking regularly!
Thanks! No need to wait for a more official release (that could take a long time since I'm prioritizing other projects).
Yeah, I agree there is room for spending to be "irrational", though I would guess this is more likely in the direction of spending less than the "rational" amount rather than more, because developing TAI could be unprecedentedly profitable and companies' spending may be limited by capital constraints.
Thanks Ben, this is right!
Yeah, I considered pegging spending to a fraction of GWP instead of a fraction of GDP, but found that when I did this I wanted to push the fraction down because I felt that even though companies are getting increasingly globalized, coordination at the world-scale would probably still be thinner than coordination at the scale of something nation-sized (even if it's not actually a literal nation). Ultimately, I just went with GDP because there are more reference points for it.
I feel pretty uncertain about this though, and think there's a lot of room for a more detailed inside-view projection on willingness-to-spend by a firm. We could calculate this by making assumptions about the global surplus created by a transformative model (easily calculable from the definition), the amount of that profit that a firm would capture if it trained a transformative model, and the size of the frontier firm over time (which could be pegged to the global economy or potentially pegged to estimates of profits from training smaller models). We could then back out what a rational firm should be willing to invest.
Yes, it's assuming the scaling behavior follows the probability distributions laid out in Part 2, and then asking whether conditional on that the model size requirements could be off by a large amount.
Thanks! Agree that functional form uncertainty is a big deal here; I think that implicitly this uncertainty is causing me to up-weight Short Horizon Neural Network more than I otherwise would, and also up-weight "Larger than all hypotheses" more than I otherwise would.
With that said, I do predict that in clean artificial cases (which may or may not be relevant), we could demonstrate linear scaling. E.g., consider the case of inserting a frame of static or a blank screen in between every normal frame of an Atari game or StarCraft game -- I'd expect that modifying the games in this way would straightforwardly double training computation requirements.
I agree that full distribution information is very valuable, although I consider medians to be important as well. The spreadsheet linked in the report provides the full distribution implied by my views for the probability that the amount of computation required to train a transformative model is affordable, although it requires some judgment to translate that into P(TAI), because there may be other bottlenecks besides computation and there may be other paths to TAI besides training a transformative model. I'd say it implies somewhere between 2031 and 2036 is the year by which there is a 10% chance of TAI.
As I said in a reply to Daniel above, the way to express the view that a brain-sized GPT model would constitute TAI is to assign a lot of weight to the Short Horizon Neural Network hypothesis, potentially along with shifting narrowing the effective horizon length. I think this is plausible, but don't believe we should have a high probability on this because I expect on priors that we would need longer effective horizon lengths than GPT-3, and I don't think that evidence from the GPT-3 paper or follow on papers have provided clear evidence to the contrary.
In my best guess inputs, I assign a 25% probability collectively to the Short Horizon Neural Network and Lifetime Anchor hypotheses; in my aggressive inputs I assign 50% probability to these two hypotheses collectively. In both cases, probabilities are smoothed to a significant extent because of uncertainty in model size requirements and scaling, with substantial weight on smaller-than-brain-sized models and larger-than-brain-sized models.
Thanks! I definitely agree that the proper modeling technique would involve introducing uncertainty on algorithmic progress, and that this uncertainty would be pretty wide; this is one of the most important few directions of future research (the others being better understanding effective horizon length and better narrowing model size).
In terms of uncertainty in model size, I personally find it somewhat easier to think about what the final spread should be in the training FLOP requirements distribution, since there's a fair amount of arbitrariness in how the uncertainty is apportioned between model size and scaling behavior. There's also semantic uncertainty about what it means to "condition on the hypothesis that X is the best anchor." If we're living in the world of "brain FLOP/s anchor + normal scaling behavior", then assigning a lot of weight to really small model sizes would wind up "in the territory" of the Lifetime Anchor hypothesis, and assigning a lot of weight to really large model sizes would wind up "in the territory" of the Evolution Anchor hypothesis, or go beyond the Evolution Anchor hypothesis.
I was roughly aiming for +- 5 OOM uncertainty in training FLOP requirements on top of the anchor distribution, and then apportioned uncertainty between model size and scaling behavior based on which one seemed more uncertain.
Thanks Daniel! Quick replies: