Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, now OpenAI Futures/Governance team. Research interests include acausal trade, timelines, takeoff speeds & scenarios, decision theory, history, and a bunch of other stuff. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html


AI Timelines
Takeoff and Takeover in the Past and Future

Wiki Contributions


Is there a convenient way to make "sealed" predictions?

Also, it's a lot easier to fake by writing 10 letters with 10 different predictions and then burning the ones that don't come true.

The Commitment Races problem

I agree with all this I think.

This is why I said commitment races happen between consequentialists (I defined that term more narrowly than you do; the sophisticated reasoning you do here is nonconsequentialist by my definition). I agree that agents worthy of the label "rational" will probably handle these cases gracefully and safely. 

However, I'm not yet supremely confident that the AGIs we end up building will handle these cases gracefully and safely. I would love to become more confident & am looking for ways to make it more likely. 

If today you go around asking experts for an account of rationality, they'll pull off the shelf CDT or EDT or game-theoretic rationality (nash equilibria, best-respond to opponent) -- something consequentialist in the narrow sense. I think there is a nonzero chance that the relevant AGI will be like this too, either because we explicitly built it that way or because in some young dumb early stage it (like humans) picks up ideas about how to behave from its environment. Or else maybe because narrow-consequentialism works pretty well in single-agent environments and many muti-agent environments too, and maybe by the time the AGI is able to self-modify to something more sophisticated it is already thinking about commitment races and already caught in their destructive logic.

(ETA: Insofar as you are saying: "Daniel, worrying about this is silly, any AGI smart enough to kill us all will also be smart enough to not get caught in commitment races" then I say... I hope so! But I want to think it through carefully first; it doesn't seem obvious to me, for the above reasons.)

GPT-3 and concept extrapolation

You would behave the exact same way as GPT-3, were you to be put in this same challenging situation. In fact I think you'd do worse; GPT-3 managed to get quite a few words actually reversed whereas I expect you'd just output gibberish. (Remember, you only have about 1 second to think before outputting each token. You have to just read the text and immediately start typing.)

PaLM in "Extrapolating GPT-N performance"

Thanks Lanrian and Gwern! Alas that my quick-and-dirty method is insufficient.

PaLM in "Extrapolating GPT-N performance"

You may be interested in this image. I would be grateful for critiques; maybe I'm thinking about it wrong?

PaLM in "Extrapolating GPT-N performance"

You calculated things for the neural network brain size anchor; now here's the peformance scaling trend calculation (I think):

I took these graphs from the Chinchilla paper and then made them transparent and superimposed them on one another and then made a copy on the right to extend the line. And I drew some other lines to extend them.

Eyeballing this graph it looks like whatever performance we could achieve with 10^27 FLOPs under the Kaplan scaling laws, we can now achieve with 10^25 FLOPs. (!!!) This is a big deal if true. Am I reasoning incorrectly here?

If this is anywhere close to correct, then the distinction you mention between two methods of getting timelines -- "Assume it happens when we train a brain-sized model compute-optimally" vs. "assume it happens when we get to superhuman performance on this ensemble of benchmarks that we already have GPT trends for" becomes even more exciting and important than I thought! It's like, a huge huge crux, because it basically makes for a 4 OOM difference!

EDIT: To be clear, if this is true then I think I should update away from the second method, on the grounds that it predicts we are only about 1 OOM away and that seems implausible.

PaLM in "Extrapolating GPT-N performance"

Cool. Yep, that makes sense. I'd love to see those numbers if you calculate them!

PaLM in "Extrapolating GPT-N performance"

So then... If before we looked at the Kaplan scaling and thought e.g. 50% chance that +6 OOMs would be enough... now we correct for the updated scaling laws and think 50% chance that, what, +4 OOMs would be enough? How big do you think the adjustment would be? (Maybe I can work it out by looking at some of those IsoX graphs in the paper?)

PaLM in "Extrapolating GPT-N performance"

The difference between Chinchilla and Gopher was small but noticeable. Since the Kaplan and DM optimal scaling trajectories are like two lines with different slopes, should we perhaps expect the difference to get larger at greater scales?

Load More