Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, now works at Center on Long-Term Risk. Research interests include acausal trade, timelines, takeoff speeds & scenarios, decision theory, history, and a bunch of other stuff. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Sequences

AI Timelines
Takeoff and Takeover in the Past and Future

Wiki Contributions

Comments

Paths To High-Level Machine Intelligence

I think this sequence of posts is underrated/underappreciated. I think this is because (A) it's super long and dense and (B) mostly a summary/distillation/textbook thingy rather than advancing new claims or arguing for something controversial. As a result of (a) and (b) perhaps it struggles to hold people's attention all the way through.

But that's a shame because this sort of project seems pretty useful to me. It seems like the community should be throwing money and interns at you, so that you can build a slick interactive website that displays the full graph (perhaps with expandable/collapsible subsections) and has the relevant paragraphs of text appear when you hover over each node, and that would just be Phase 1, Phase 2 would be adding mathematical relationships to the links and numbers to the nodes, such that people can input their own numbers and the whole system updates itself to spit out answers about nodes at the end. The end result would be like a wiki + textbook only better organized and more fun, while simultaneously being a the most detailed and comprehensive quantitative model of AI stuff ever.

[AN #164]: How well can language models write code?
I agree that humans would do poorly in the experiment you outline. I think this shows that, like the language model, humans-with-one-second do not "understand" the code.

Haha, good point -- yes. I guess what I should say is: Since humans would have performed just as poorly on this experiment, it doesn't count as evidence that e.g. "current methods are fundamentally limited" or "artificial neural nets can't truly understand concepts in the ways humans can" or "what goes on inside ANN's is fundamentally a different kind of cognition from what goes on inside biological neural nets" or whatnot.

[AN #164]: How well can language models write code?

Thanks again for these newsletters and summaries! I'm excited about the flagship paper.


First comment: I don't think their experiment about code execution is much evidence re "true understanding."

Recall that GPT-3 has 96 layers and the biggest model used in this paper was smaller than GPT-3. Each pass through the network is therefore loosely equivalent to less than one second of subjective time, by comparison to the human brain which typically goes through something like 100 serial operations per second I think? Could be a lot more, I'm not sure. https://aiimpacts.org/rate-of-neuron-firing/#Maximum_neural_firing_rates

So, the relevant comparison should be: Give a human the same test. Show them some code and give them 1 second to respond with an answer (or the first token of an answer, and then 1 second for the second token, and so forth). See how well they do at predicting the code output. I predict that they'd also do poorly, probably <50% accuracy. In claim that this passage from the paper inadvertently supports my hypothesis:

Including test cases and natural language descriptions in the prompt lead to the highest overall performance—higher than using the code itself. Because the code unambiguously describes the semantics, whereas test cases do not, this suggests that models are in some sense not really “reading” the source code and using it to execute. Models trained on general text corpora may be better at inducing patterns from as few as two input-output examples than they are at predicting the execution of code.

Second comment: Speculation about scaling trends:

Extrapolating from Figure 3, it seems that an AI which can solve (via at least one sample) approximately 100% of coding tasks in this set, without even needing fine-tuning, will require +2 OOMs of parameters, which would probably cost about $5B to train when you factor in the extra data required but also the lower prices and algorithmic improvements since GPT-3. Being almost 2 OOMs bigger than GPT-3, it might be expected to cost $6 per 1000 tokens, which would make it pretty expensive to use (especially if you wanted to use it at full-strength where it makes multiple samples and then picks the best one) though I think it might still find an economic niche; you could have a system where first a smaller model attempts a solution and you only call up the big model if that fails, and then you keep generating samples till you get one that works so on average the number of samples you need to generate will be small, and only cost you multiple dollars for a the toughest few percentile of cases. Then this service could be used by well-paid programmers for whom the time savings are worth it.

Does this extrapolation/speculation seem right?

Forecasting Thread: AI Timelines

It's been a year, what do my timelines look like now?

My median has shifted to the left a bit, it's now 2030. However, I have somewhat less probability in the 2020-2025 range I think, because I've become more aware of the difficulties in scaling up compute. You can't just spend more money. You have to do lots of software engineering and for 4+ OOMs you literally need to build more chip fabs to produce more chips. (Also because 2020 has passed without TAI/AGI/etc., so obviously I won't put as much mass there...)

So if I were to draw a distribution it would look pretty similar, just a bit more extreme of a spike and the tip of the spike might be a bit to the right.

Thoughts on gradient hacking

Even in the simple case no. 1, I don't quite see why Evan isn't right yet.

It's true that deterministically failing will create a sort of wall in the landscape that the ball will bounce off of and then roll right back into as you said. However, wouldn't it also perhaps roll in other directions, such as perpendicular to the wall? Instead of getting stuck bouncing into the wall forever, the ball would bounce against the wall while also rolling in some other direction along it. (Maybe the analogy to balls and walls is leading me astray here?)

MIRI/OP exchange about decision theory
My own answer would be the EDT answer: how much does your decision correlate with theirs? Modulated by ad-hoc updatelessness: how much does that correlation change if we forget "some" relevant information? (It usually increases a lot.)

I found this part particularly interesting and would love to see a fleshed-out example of this reasoning so I can understand it better.

The Codex Skeptic FAQ

That's reasonable. OTOH if Codex is as useful as some people say it is, it won't just be 10% of active users buying subscriptions and/or subscriptions might cost more than $15/mo, and/or people who aren't active on GitHub might also buy subscriptions.

The Codex Skeptic FAQ

For context, GitHub has 60,000,000 users. If 10% of them buy a $15/mo subscription, that's a billion dollars a year in annual revenue. A billion dollars is about a thousand times more than the cost to create Codex. (The cost to train the model was negligible since it's only the 12B param version of GPT-3 fine-tuned. The main cost would be the salaries of the engineers involved, I imagine.)

The Codex Skeptic FAQ

I'm extremely keen to hear from people who have used Codex a decent amount (or tried to) and decided it isn't worth it. Specifically, people who wouldn't pay $15/mo for a subscription to it. Anyone?

Analogies and General Priors on Intelligence

Thanks for doing this! I feel like this project is going to turn into a sort of wiki-like thing, very useful for people trying to learn more about AI risk and situate themselves within it. I think AI Impacts had ambitions to do something like this at one point.

Load More