All of Daniel Kokotajlo's Comments + Replies

Is there a convenient way to make "sealed" predictions?

Also, it's a lot easier to fake by writing 10 letters with 10 different predictions and then burning the ones that don't come true.

The Commitment Races problem

I agree with all this I think.

This is why I said commitment races happen between consequentialists (I defined that term more narrowly than you do; the sophisticated reasoning you do here is nonconsequentialist by my definition). I agree that agents worthy of the label "rational" will probably handle these cases gracefully and safely. 

However, I'm not yet supremely confident that the AGIs we end up building will handle these cases gracefully and safely. I would love to become more confident & am looking for ways to make it more likely. 

If toda... (read more)

GPT-3 and concept extrapolation

You would behave the exact same way as GPT-3, were you to be put in this same challenging situation. In fact I think you'd do worse; GPT-3 managed to get quite a few words actually reversed whereas I expect you'd just output gibberish. (Remember, you only have about 1 second to think before outputting each token. You have to just read the text and immediately start typing.)

5Stuart Armstrong1mo
The aim of this post is not to catch out GPT-3; it's to see what concept extrapolation could look like for a language model.
PaLM in "Extrapolating GPT-N performance"

Thanks Lanrian and Gwern! Alas that my quick-and-dirty method is insufficient.

PaLM in "Extrapolating GPT-N performance"

You may be interested in this image. I would be grateful for critiques; maybe I'm thinking about it wrong?

PaLM in "Extrapolating GPT-N performance"

You calculated things for the neural network brain size anchor; now here's the peformance scaling trend calculation (I think):

I took these graphs from the Chinchilla paper and then made them transparent and superimposed them on one another and then made a copy on the right to extend the line. And I drew some other lines to extend them.

Eyeballing this graph it looks like whatever performance we could achieve with 10^27 FLOPs under the Kaplan scaling laws, we can now achieve with 10^25 FLOPs. (!!!) This is a big deal if true. Am I reasoning incorrectly here?... (read more)

3Lukas Finnveden1mo
First I gotta say: I thought I knew the art of doing quick-and-dirty calculations, but holy crap, this methodology is quick-and-dirty-ier than I would ever have thought of. I'm impressed. But I don't think it currently gets to right answer. One salient thing: it doesn't take into account Kaplan's "contradiction". I.e., Kaplan's laws already suggested that once we were using enough FLOP, we would have to scale data faster than we have to do in the short term. So when I made my extrapolations, I used a data-exponent that was larger than the one that's represented in that graph. I now tried to do figure out the answer to this question using Chinchilla's loss curves and Kaplan's adjusted-for-contradiction loss curves, but I realised... ...that Chinchilla's "loss" and Kaplan's "loss" are pretty incomparable. It's unsurprising that they're somewhat different (they might have used different datasets or something, when evaluating the loss), but I am surprised that Chinchilla's curves uses an additive term that predicts that loss will never go below 1.69. What happened with the claims that ideal text-prediction performance was like 0.7? (E.g. see here [] for me asking why gwern estimates 0.7, and gwern responding.) Anyway, this makes it very non-obvious to me how to directly translate my benchmark extrapolations to a chinchilla context. Given that their "loss" is so different, I don't know what I could reasonably assume about the relationship between [benchmark performance as a function of chinchilla!loss] and [benchmark performance as a function of gpt-3!loss].
PaLM in "Extrapolating GPT-N performance"

Cool. Yep, that makes sense. I'd love to see those numbers if you calculate them!

PaLM in "Extrapolating GPT-N performance"

So then... If before we looked at the Kaplan scaling and thought e.g. 50% chance that +6 OOMs would be enough... now we correct for the updated scaling laws and think 50% chance that, what, +4 OOMs would be enough? How big do you think the adjustment would be? (Maybe I can work it out by looking at some of those IsoX graphs in the paper?)

Depends on how you were getting to that +N OOMs number.

If you were looking at my post, or otherwise using the scaling laws to extrapolate how fast AI was improving on benchmarks (or subjective impressiveness), then the chinchilla laws means you should get there sooner. I haven't run the numbers on how much sooner.

If you were looking at Ajeya's neural network anchor (i.e. the one using the Kaplan scaling-laws, not the human-lifetime or evolution anchors), then you should now expect that AGI comes later. That model anchors the number of parameters in AGI to ... (read more)

PaLM in "Extrapolating GPT-N performance"

The difference between Chinchilla and Gopher was small but noticeable. Since the Kaplan and DM optimal scaling trajectories are like two lines with different slopes, should we perhaps expect the difference to get larger at greater scales?

PaLM in "Extrapolating GPT-N performance"

Thanks for doing this!

Since PaLM was basically a continuation of the scaling strategy derived from the Kaplan paper, it's not surprising that it basically continues the trend. (right?)

I'd be very interested to see how Chinchilla compares, since it claims to be using a superior scaling strategy.

That's my read. It continues the Kaplan scaling. The Kaplan scaling isn't wrong (everything really does scale that way if you train that way), it's just suboptimal. PaLM is not a surprise, neither in the compute cost nor in having capability-spikes (at least, if you've been paying attention and not handwaving them away).

The surprise here is perhaps showing how bad GB/DM communications are, that DM may have let GB piss away millions of dollars of TPU time. As one Googler put it, 'we find out about this stuff the same way you do - from Twitter'.

[Link] Training Compute-Optimal Large Language Models

OK, good to know. I look forward to seeing the performance trends updated with the new scaling paradigm/law.

[Link] Training Compute-Optimal Large Language Models

(In terms of the neural network model, this means lowering our estimate for how many parameters will be needed.)

[Link] Training Compute-Optimal Large Language Models

Overall I guess this should shorten timelines, because the effect you explain here is counteracted by the other first-order effect of "oh geez it looks like our earlier scaling projections were inefficient; for any performance level, we now know how to reach that level for less compute cost than the earlier projections said." What do you think?

3Jacob Hilton2mo
I suppose that depends on whether you think this constitutes several years of progress over and above what you would have expected. I don't think this comes close to that, so I think the effect is much smaller.

It ought to shorten actual timelines, for the reason you say.  (Except insofar as data sourcing could actually become a practical problem.)

However, it lengthens the Bio Anchors timeline, because the parameter count in Bio Anchors is fixed.  (It's the parameter count of a model that uses about as much inference compute as the brain.)

This is a weird thing about Bio Anchors -- it asks when models will cross a threshold for the compute required to run them, so efficiency improvements of various kinds will lengthen its timeline.  It's always wait... (read more)

2Daniel Kokotajlo2mo
(In terms of the neural network model, this means lowering our estimate for how many parameters will be needed.)
Procedurally evaluating factual accuracy: a request for research

Thanks for writing this, I'm excited to see more work on this subject!

One minor musing: I think the problem is a bit more dire than the framing "who to align to" suggests. Humans are biased, including us, including me. A system which replicates those biases and tells us/me what we would have concluded if we investigated in our usual biased way... is "aligned" in some sense, but in a very important sense is unaligned.* To use Ajeya's metaphor, it's a sycophant, not a saint. Rather than assisting us to find the truth, it'll assist us in becoming more unreas... (read more)

DeepMind: Generally capable agents emerge from open-ended play

Thanks so much! So, for comparison, fruit flies have more synapses than these XLAND/GOAT agents have parameters!

Late 2021 MIRI Conversations: AMA / Discussion

Here is a heavily condensed summary of the takeoff speeds thread of the conversation, incorporating earlier points made by Hanson, Grace, etc.


(kudos to Ben Goldhaber for pointing me to it)

ELK Thought Dump

I wrote my undergrad thesis on this problem and tentatively concluded it's unsolveable, if you read it and think you have a solution that might satisfy me I'd love to hear it! Maybe Chalmers (linked by Jacob) solves it, idk.

A comment on Ajeya Cotra's draft report on AI timelines

(For the next few years or so at least. Specialization won't allow us to permanently have a faster trend, I think, but maybe for the next ten years...)

A comment on Ajeya Cotra's draft report on AI timelines

I think the point you make is a good one and should increase your credence in AGI/TAI/etc. by 2030. (Because if you take Ajeya's distribution and add more uncertainty to it, the left tail gets thicker as well as the right). I'd love to see someone expand on Ajeya's spreadsheet to include uncertainty in the flops/$ and 2020flops/flop trends.

Re FLOPS-per-dollar trends: My impression is that what we care about is the price of compute specifically for large AGI training runs, which is different from the price of compute in general. (which is what the dataset i... (read more)

2Daniel Kokotajlo3mo
(For the next few years or so at least. Specialization won't allow us to permanently have a faster trend, I think, but maybe for the next ten years...)
What 2026 looks like

Minor update: See e.g. this US government website definitions:

Misinformation is false, but not created or shared with the intention of causing harm.
Disinformation is deliberately created to mislead, harm, or manipulate a person, social group, organization, or country.
Malinformation is based on fact, but used out of context to mislead, harm, or manipulate.

(got this example from Zvi's covid post today)

Also, the recent events with GoFundMe and GiveSendGo is an instance of the trend I predicted with separate tech stacks being developed. (GoFundMe froze and/o... (read more)

Visible Thoughts Project and Bounty Announcement

Came across this today on r/mlscaling and thought I'd put it here since it's relevant:

This paper explores the ability of language models to generate a coherent chain of thought—a series of short sentences that mimic the reasoning process a person might have when responding to a question. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks that otherwise have flat scaling curves.
Truthful LMs as a warm-up for aligned AGI

Nothing to apologize for, it was reasonably clear, I'm just trying to learn more about what you believe and why. This has been helpful, thanks!

I totally agree that in fast takeoff scenarios we are less likely to spot those things until it's too late. I guess I agree that truthful LM work is less likely to scale gracefully to AGI in fast takeoff scenarios... so I guess I agree with your overall point... I just notice I feel a bit confused and muddle about it, is all. I can imagine plausible slow-takeoff scenarios in which truthful LM work doesn't scale grac... (read more)

Truthful LMs as a warm-up for aligned AGI

Thanks, these clarifications are very helpful.

FWIW I think paul slow takeoff is pretty unlikely for reasons to be found in this thread and this post. On the other hand, as someone who thinks fast takeoff (in various senses) is more likely than not, I don't yet see why that makes Truthful LM work significantly less useful. (By contrast I totally see why Truthful LM work is significantly less useful if AGI/TAI/etc. comes from stuff that doesn't resemble modern deep learning.)

"Catch misalignment early..." This makes it sound like misalignment is something tha... (read more)

3Jacob Hilton4mo
"Catch misalignment early..." - This should have been "scary misalignment", e.g. power-seeking misalignment, deliberate deception in order to achieve human approval, etc., which I don't think we've seen clear signs of in current LMs. My thinking was that in fast takeoff scenarios, we're less likely to spot this until it's too late, and more generally that truthful LM work is less likely to "scale gracefully" to AGI. It's interesting that you don't share these intuitions. As mentioned, this phrase should probably be replaced by "a significant portion of the total existential risk from AI comes from risks other than power-seeking misalignment". There isn't supposed to be a binary cutoff for "significant portion"; the claim is that the greater the risks other than power-seeking misalignment, the greater the comparative advantage of truthful LM work. This is because truthful LM work seems more useful for addressing risks from social problems such AI persuasion (as well as other potential risks that haven't been as clearly articulated yet, I think). Sorry that my original phrasing was so unclear.
Truthful LMs as a warm-up for aligned AGI

Thanks for this!

I think that working on truthful LMs has a comparative advantage in worlds where:
--We have around 10-40 years until transformative AI
--Transformative AI is built using techniques that resemble modern deep learning
--There is a slow takeoff
--Alignment does not require vastly more theoretical insight (but may require some)
--Our current picture of the risks posed by transformative AI is incomplete

Can you elaborate on what you mean by slow takeoff here?

Also, what do you mean by the current picture of the risks being incomplete? What would it ev... (read more)

2Jacob Hilton4mo
Thanks for these questions, these phrases were ambiguous or poorly chosen: * By "slow takeoff", I had in mind the "Paul slow takeoff [] " definition, although I think the (related) "Continuous takeoff [] " definition is more relevant to this post. The point is that trying for alignment to continually keep pace with capabilities, and to catch misalignment early, seems less valuable if there is going to be a sudden jump in capabilities. (I could be wrong about this, as I don't think I understand the fast takeoff viewpoint well.) * By "our current picture of the risks is incomplete", I meant something like: a significant portion of the total existential risk from AI comes from scenarios that have not yet been clearly articulated. More specifically, I had in mind power-seeking misalignment [] as the most clearly articulated risk, so I think it would have been better to say: a significant portion of the total existential risk from AI comes from risks other than power-seeking misalignment. Examples of potential sources of such risk include AI persuasion [] , social upheaval, deliberate misuse, authoritarianism and unforseen risks.
Forecasting Thread: AI Timelines

If I understand you correctly, you are asking something like: How many programmer-hours of effort and/or how much money was being spent specifically on scaling up large models in 2020? What about in 2025? Is the latter plausibly 4 OOMs more than the former? (You need some sort of arbitrary cutoff for what counts as large. Let's say GPT-3 sized or bigger.)

Yeah maybe, I don't know! I wish I did. It's totally plausible to me that it could be +4 OOMs in this metric by 2025. It's certainly been growing fast, and prior to GPT-3 there may not have been much of it at all.

1Michaël Trazzi4mo
Yes, something like: given (programmer-hours-into-scaling(July 2020) - programmer-hours-into-scaling(Jan 2022)), and how much progress there has been on hardware for such training (I don't know the right metric for this, but probably something to do with FLOP and parallelization), the extrapolation to 2025 (either linear or exponential) would give the 4 OOM you mentioned.
Risks from AI persuasion

Thank you for doing this research! There's a lot that I love about this piece, besides the obvious thing which is that it is seriously investigating a very important and neglected topic. For example, I love the definitions of various things, I love the little vignettes of ways the near future could go, and I love the core argument about why these possibilities are both plausible and really bad.

I am pretty much on the same page as you, but I have a few minor disagreements (mostly about emphasis). TL;DR is that you focus on more extreme, exotic possibilities... (read more)

Reply to Eliezer on Biological Anchors

Thanks, nice answers!

I agree it would be good to extend the bio anchors framework to include more explicit modelling of data requirements and the like instead of just having it be implicit in the existing variables. I'm generally a fan of making more detailed, realistic models and this seems reasonably high on the priority list of extensions to make. I'd also want to extend the model to include multiple different kinds of transformative task and dangerous task, and perhaps to include interaction effects between them (e.g. once we get R&D acceleration t... (read more)

Re: humans/brains, I think what humans are a proof of concept of is that, if you start with an infant brain, and expose it to an ordinary life experience (a la training / fine-tuning), then you can get general intelligence. But I think this just doesn't bear on the topic of Bio Anchors, because Bio Anchors doesn't presume we have a brain, it presumes we have transformers. And transformers don't know what to do with a lifetime of experience, at least nowhere near as well as an infant brain does. I agree we might learn more about AI from examining humans! But that's leaving the Bio Anchors framing of "we just need compute" and getting into the framing of algorithmic improvements etc. I don't disagree broadly that some approaches to AI might not have as big a (pre-)training phase the way current models do, if, for instance, they figure out a way to "start with" infant brains. But I don't see the connection to the Bio Anchors framing. What's so bad about perplexity? I'm not saying perplexity is bad per se, just that it's unclear how much data you need, with perplexity as your objective, to achieve general-purpose language facility. It's unclear both because the connection between perplexity and extrinsic linguistic tasks is unclear, and because we don't have great ways of measuring extrinsic linguistic tasks. For instance, the essay you cite itself cites two very small experiments showing correlation between perplexity and extrinsic tasks. One of them is a regression on 8 data points, the other has 24 data points. So I just wouldn't put too much stake in extrapolations there. Furthermore, and this isn't against perplexity, but I'd be skeptical of the other variable i.e. the linguistic task perplexity is regressed against: in both cases, a vague human judgement of whether model output is "human-like". I think there's not much reason to think that is correlated to some general-purpose language facility. Attempts like this to (roughly speaking) operationalize the Turing t
Risks from AI persuasion

Awesome post! I'll have more to say later, but for now, check out this experiment I ran with GPT-3:

Inspired by this bit of the post:

If unfavorable regulation is threatened, companies use their widespread companion bots to sway public opinion, making people feel sympathetic for their AI companion who ‘is afraid of getting modified or shut down’ by some regulation.

I decided to ask GPT-3 in chatbot mode what it thought about regulation like this. I did 5 trials; tl;dr is that GPT-3 supported regulation twice and opposed it twice and got confused once.

What l... (read more)

Reply to Eliezer on Biological Anchors

I occasionally hear people make this point but it really seems wrong to me, so I'd like to hear more! Here are the reasons it seems wrong to me:

1. Data generally seems to be the sort of thing you can get more of by throwing money at. It's not universally true but it's true in most cases, and it only needs to be true for at least one transformative or dangerous task. Moreover, investment in AI is increasing; when a tech company is spending $10,000,000,000 on compute for a single AI training run, they can spend 10% as much money to hire 2,000 trained profess... (read more)

Yes, good questions, but I think there are convincing answers. Here's a shot:

1. Some kinds of data can be created this way, like parallel corpora for translation or video annotated with text. But I think it's selection bias that it seems like most cases are like this. Most of the cases we're familiar with seem like this because this is what's easy to do! But transformative tasks are hard, and creating data that really contains latent in it the general structures necessary for task performance, that is also hard. I'm not saying research can't solve it, but ... (read more)

Interlude: Agents as Automobiles

Fair. If you don't share my intuition that people in 1950 should have had more than 90% credence that computers would be militarily useful, or that people at the dawn of steam engines should have predicted that automobiles would be useful (conditional on them being buildable) then that part of my argument has no force on you.

Maybe instead of picking examples from the past, I should pick an example of a future technology that everyone agrees is 90%+ likely to be super useful if developed, even though Joe's skeptical arguments can still be made.

Interlude: Agents as Automobiles

Thanks! Ah, I shouldn't have put the word "portable" in there then. I meant to be talking about computers in general, not computers-on-missiles-as-opposed-to-ground-installations.

Also, the whole setup of picking something which we already know to be widespread (cars) and then applying Joe's arguments to it, seems like it's shouldn't tell us much. If Joe were saying 1% yes, 99% no for incentives to build APS systems, then the existence of counterexamples like cars which have similar "no" arguments would be compelling. But he's saying 80% yes, 20% no, and so
... (read more)
4Richard Ngo5mo
I guess I just don't feel like you've established that it would have been reasonable to have credence above 90% in either of those cases. Like, it sure seems obvious to me that computers and automobiles are super useful. But I have a huge amount of evidence now about both of those things that I can't really un-condition on. So, given that I know how powerful hindsight bias can be, it feels like I'd need to really dig into the details of possible alternatives before I got much above 90% based on facts that were known back then. (Although this depends on how we're operationalising the claims. If the claim is just that there's something useful which can be done with computers - sure, but that's much less interesting. There's also something useful that can be done with quantum computers, and yet it seems pretty plausible that they remain niche and relatively uninteresting.)
Interlude: Agents as Automobiles

Musings on ways in which the analogy is deep, or at least seems deep to me right now: (It's entirely possible my pattern-recognizer is overclocking and misfiring here... this is fun to think about though)

Automobiles move the cargo to the target. To do this, they move themselves. Agents improve the world according to some evaluation function. To do this, they improve their own power/capability/position.

You often don't know what's inside an automobile until the thing reaches its destination and gets unloaded. Similarly, you often don't know what's inside an ... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works

I guess I would say: Ajeya's framework/model can incorporate this objection; this isn't a "get rid of the whole framework" objection but rather a "tweak the model in the following way" objection.

Like, I agree that it would be bad if everyone who used Ajeya's model had to put 100% of their probability mass into the six bio anchors she chose. That's super misleading/biasing/ignores loads of other possible ways AGI might happen. But I don't think of this as a necessary part of Ajeya's model; when I use it, I throw out the six bio anchors and just directly input my probability distribution over OOMs of compute. My distribution is informed by the bio anchors, of course, but that's not the only thing that informs it.

2Adam Shimi5mo
First, I want to clarify that I feel we're going into a more interesting place, where there's a better chance that you might find a point that invalidates Yudkowsky's argument, and can thus convince him of the value of the model. But it's also important to realize that IMO, Yudkowsky is not just saying that biological anchors are bad. The more general problem (which is also developed in this post) is that predicting the Future is really hard. In his own model of AGI timelines, the factor that is basically impossible to predict until you can make AGI is the "how much resources are needed to build AGI". So saying "let's just throw away the biological anchors" doesn't evade the general counterargument that to predict timelines at all, you need to find information on "how much resources are needed to build AGI", and that is incredibly hard. If you or Ajeya can argue for actual evidence in that last question, then yeah, I expect Yudkowsky would possibly update on the validity of the timeline estimates. But at the moment, in this thread, I see no argument like that.
Biology-Inspired AGI Timelines: The Trick That Never Works

Thanks for this comment (and the other comment below also).

I think we don't really disagree that much here. I may have just poorly communicated, slash maybe I'm objecting to the way Yudkowsky said things because I read it as implying things I disagree with.

I don't think this is what Yudkowsky is saying at all in the post. Actually, I think he is saying the exact opposite: that 2.5 years estimate is too fast as an estimate that is supposed to always work. If I understand correctly, his point is that you have significantly less than that most of the time, ex
... (read more)
2Adam Shimi5mo
Hum, I would say Yudkowsky seems to agree with the value of a probability distribution for timelines. (Quoting The Weak Inside View [] (2008) from the AI FOOM Debate []) On the other hand, my interpretation of Yudkowsky strongly disagree with the second part of your paragraph: So my interpretation of the text is that Yudkowsky says that you need to know how compute will be transformed into AGI to estimate the timelines (then you can plug your estimates for the compute), and that the default of any approach which relies on biological analogies for that part will be sprouting nonsense, because evolution and biology optimize in fundamentally different ways than human researchers do. For each of the three examples, he goes into more detail about the way this is instantiated. My understanding of his criticism of Ajeya's model is that he disagrees that just current deep learning algorithms are actually a recipe for turning compute into AGI, and so saying "we keep to current deep learning and estimated the required compute" doesn't make sense and doesn't solve the question of how to turn compute into AGI. (Note that his might be the place where you or someone defending Ajeya's model want to disagree with Yudkowsky. I'm just pointing that this is a more productive place to debate him because that might actually make him change his mind — or change your mind if he convinces you) The more general argument (the reason why "the trick" doesn't work) is that if you actually have a way of transforming compute into AGI, that means you know how to build AGI. And if you do, you're very, very close to the end of the timeline.
Biology-Inspired AGI Timelines: The Trick That Never Works
Strongly disagree with this, to the extent that I think this is probably the least cruxy topic discussed in this post, and thus the comment is as wrong as is physically possible.

Hahaha ok, interesting! If you are right I'll take some pride in having achieved that distinction. ;)

I interpreted Yudkowsky as claiming that Ajeya's model had enough free parameters that it could be made to predict a wide range of things, and that what was actually driving the 30-year prediction was a bunch of implicit biases rather than reality. Platt's Law is evidence for this c... (read more)

4Adam Shimi5mo
I do think you are misconstruing Yudkowsky's argument. I'm going to give evidence (all of which are relatively strong IMO) in order of "ease of checkability". So I'll start with something anyone can check in a couple of minutes, and close by the more general interpretation that requires rereading the post in details. Evidence 1: Yudkowsky flags Simulated-Eliezer as talking smack in the part you're mentioning If I follow you correctly, your interpretation mostly comes from this part: Note that this is one of the two times in this dialogue where Simulated-OpenPhil calls out Simulated-Eliezer. But remember that this whole dialogue was written by Yudkowsky! So he is flagging himself that this particular answer is a quip. Simulated-Eliezer doesn't reexplain it as he does most of his insulting points to Humbali; instead Simulated-Eliezer goes for a completely different explanation in the next answer. Evidence 2: Platt's law is barely mentioned in the whole dialogue "Platt" is used 6-times in the 20k words piece. "30 years" is used 8 times (basically at the same place where "Platt" is used"). Evidence 3: Humbali spends far more time discussing and justifying the "30 years" time than Simulated-OpenPhil. And Humbali is the strawman character, whereas Simulated-OpenPhil actually tries to discuss and to understand what Simulated Eliezer is saying. Evidence 4: There is an alternative interpretation that takes into account the full text and doesn't use Platt's law at all: see this comment [] on your other thread for my current best version of that explanation. Evidence 5: Yudkowsky's whole criticism relying on a purely empirical and superficial similarity goes contrary to everything that I extracted from his writing in my recent post [
Against GDP as a metric for timelines and takeoff speeds

(I am the author)

I still like & stand by this post. I refer back to it constantly. It does two things:

1. Argue that an AI-induced point of no return could significantly before, or significantly after, world GDP growth accelerates--and indeed will probably come before!

2. Argue that we shouldn't define timelines and takeoff speeds in terms of economic growth. So, against "is there a 4 year doubling before a 1 year doubling?" and against "When will we have TAI = AI capable of doubling the economy in 4 years if deployed?"

I think both things are pretty impo... (read more)

Cortés, Pizarro, and Afonso as Precedents for Takeover

(I am the author)

I still like & endorse this post. When I wrote it, I hadn't read more than the wiki articles on the subject. But then afterwards I went and read 3 books (written by historians) about it, and I think the original post held up very well to all this new info. In particular, the main critique the post got -- that disease was more important than I made it sound, in a way that undermined my conclusion -- seems to have been pretty wrong. (See e.g. this comment thread, these follow up posts)

So, why does it matter? What contribution did this po... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works

Adding to what Paul said: jacob_cannell points to this comment which claims that in Mind Children Moravec predicted human-level AGI in 2028.

Moravec, "Mind Children", page 68: "Human equivalence in 40 years". There he is actually talking about human-level intelligent machines arriving by 2028 - not just the hardware you would theoretically require to build one if you had the ten million dollars to spend on it.

I just went and skimmed Mind Children. He's predicting human-equivalent computational power on a personal computer in 40 years. He seems to say tha... (read more)

Extrapolating GPT-N performance

We all saw the GPT performance scaling graphs in the papers, and we all stared at them and imagined extending the trend for another five OOMs or so... but then Lanrian went and actually did it! Answered the question we had all been asking! And rigorously dealt with some technical complications along the way.

I've since referred to this post a bunch of times. It's my go-to reference when discussing performance scaling trends.

Draft report on AI timelines

Ajeya's timelines report is the best thing that's ever been written about AI timelines imo. Whenever people ask me for my views on timelines, I go through the following mini-flowchart:

1. Have you read Ajeya's report?

--If yes, launch into a conversation about the distribution over 2020's training compute and explain why I think the distribution should be substantially to the left, why I worry it might shift leftward faster than she projects, and why I think we should use it to forecast AI-PONR instead of TAI.

--If no, launch into a conversation about Ajey... (read more)

Moore's Law, AI, and the pace of progress

Ok, thanks! I defer to your judgment on this, you clearly know way more than me. Oh well, there goes one of my hopes for the price of compute reaching a floor.

An overview of 11 proposals for building safe advanced AI

This post is the best overview of the field so far that I know of. I appreciate how it frames things in terms of outer/inner alignment and training/performance competitiveness--it's very useful to have a framework with which to evaluate proposals and this is a pretty good framework I think.

Since it was written, this post has been my go-to reference both for getting other people up to speed on what the current AI alignment strategies look like (even though this post isn't exhaustive). Also, I've referred back to it myself several times. I learned a lot from... (read more)

Moore's Law, AI, and the pace of progress

Well done!

What do you think about energy costs? Last I thought about this it seemed plausible to me that in ten years or so the atoms making up supercomputers will be <10% of the cost of training giant models, most of the cost being paying for the electricity and upkeep and rent.

Lifetime energy costs are already significant, but I don't think the problem will get that skew this decade. IRDS' predicted transistor scaling until ~2028 should prevent power density increasing by too much. Longer-term this does become a greater concern. I can't say I have particularly wise predictions here. There are ways to get more energy efficiency by spending more on lower-clocked hardware, or by using a larger memory:compute ratio, and there are also hardware architectures with plausible significant power advantages. There are even potential ways for energy to fall in price, like with solar PV or fusion, though I haven't a good idea how far PV prices could fall, and for fusion it seems like a roll of the dice what the price will be. It's entirely possible energy does just become the dominant cost and none of those previous points matter, but it's also an input we know we can scale up pretty much arbitrarily if we're willing to spend the money. It's also something that only starts to become a fundamental economic roadblock after a lot more scaling. For instance, the 100,000 wafer scale processor example requires a lot of power, but only about as much as largest PV installations that currently exist. You could then upgrade it to 2028 technology and stack memory on top of the wafers without changing power density by all that much. This is likely a topic worth periodically revisiting as the issue gets closer.
More Christiano, Cotra, and Yudkowsky on AI progress

Thanks! Wow I missed/forgot that 30% figure, my bad. I disagree with you much less than I thought! (I'm more like 70% instead of 30%). [ETA: Update: I'm going with the intuitive definition of takeoff speeds here, not the "doubling in 4 years before 1 year?" one. For my thoughts on how to define takeoff speeds, see here. If GWP doubling times is the definition we go with then I'm more like 85% fast takeoff I think, for reasons mentioned by Rob Bensinger below.]

3Evan R. Murphy5mo
So here y'all have given your sense of the likelihoods as follows: * Paul: 70% soft takeoff, 30% hard takeoff * Daniel: 30% soft takeoff, 70% hard takeoff How would Eliezer's position be stated in these terms? Similar to Daniel's?
More Christiano, Cotra, and Yudkowsky on AI progress

1. Everyone agrees that if we have less than 10 years left before the end, it's probably not going to look like the multi-year, gradual, distributed takeoff Paul prophecies, and instead will look crazier, faster, more discontinuous, more Yudkowskian... right? In other words, everyone agrees <10-year timelines and Paul-slow takeoff are in tension with each other.*

2. Assuming we agree on 1, I'd be interested to hear whether people think we should resolve this tension by having low credence in <10 year timelines, or not having low credence in Yudkowskia... (read more)

I still expect things to be significantly more gradual than Eliezer, in the 10 year world I think it will be very fast but we still have much tighter bounds on how fast (maybe median is more like a year and very likely 2+ months). But yes, the timeline will be much shorter than my default expectation, and then you also won't have time for big broad impacts.

I don't think you should have super low credence in fast takeoff. I gave 30% in the article that started this off, and I'm still somewhere in that ballpark.

Perhaps you think this implies a "low credence" in <10 year timelines. But I don't really think the arguments about timelines are "solid" to the tune of 20%+ probability in 10 years.

Misc. questions about EfficientZero

Like Gwern said, the goal is to win. If EfficientZero gets superhuman data-efficiency by "cheating," well it still got superhuman data-efficiency...

I think a relevant comparison here would be total cost. EfficientZero took 7 hours on 4 GPUs to master each particular Atari game. Equipment probably cost around $10,000 last I checked. How long is the lifespan of a GPU? Two years? OK, so that's something like 20,000 hours of 4 GPU's time for $10K, so $0.50 for one hour, so $3.50 for the training run? Eh, maybe it's a bit more than that due to energy costs or s... (read more)

4Steve Byrnes6mo
Hmm, I think my comment came across as setting up a horse-race between EfficientZero and human brains, in a way that I didn't intend. Sorry for bad choice of words. In particular, when I wrote "how AI compares to human brains", I meant in the sense of "In what ways are they similar vs different? What are their relative strengths and weaknesses? Etc.", but I guess it sounded like I was saying "human brain algorithms are better and EfficientZero is worse". Sorry. I could write a "human brain algorithms are fundamentally more powerful than EfficientZero" argument, but I wasn't trying to, and such an argument sure as heck wouldn't fit in a comment. :-) Sure. If Atari sample efficiency is what we ultimately care about, then the results speak for themselves. For my part, I was using sample efficiency as a hint about other topics that are not themselves sample efficiency. For example, I think that if somebody wants to understand AlphaZero, the fact that it trained on 40,000,000 games of self-play is a highly relevant and interesting datapoint. Suppose you were to then say "…but of those 40,000,000 games, fundamentally it really only needed 100 games with the external simulator to learn the rules. The other 39,999,900 games might as well have been 'in its head'. This was proven in follow-up work.". I would reply: "Oh. OK. That's interesting too. But I still care about the 40,000,000 number. I still see that number as a very important part of understanding the nature of AlphaZero and similar systems." (I'm not sure we're disagreeing about anything…)
Biology-Inspired AGI Timelines: The Trick That Never Works

These are three separate things:

(a) What is the meaning of "2020-FLOPS-equivalent that TAI needs?"

(b) Can you build TAI with 2020 algorithms without some truly astronomical amount of FLOPs?

(c) Why should we believe the "neural anchor?"

(a) is answered roughly in my linked post and in much more detail and rigor in Ajeya's doc.

(b) depends on what you mean by truly astronomical; I think it would probably be doable for 10^35, Ajeya thinks 50% chance.

For (c), I actually don't think we should put that much weight on the "neural anchor," and I don't think Ajeya's... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works
What is the meaning of "2020-FLOPS-equivalent that TAI needs"? Plausibly you can't build TAI with 2020 algorithms without some truly astronomical amount of FLOPs.

I think 10^35 would probably be enough. This post gives some intuition as to why, and also goes into more detail about what 2020-flops-equivalent-that-TAI-needs means. If you want even more detail + rigor, see Ajeya's report. If you think it's very unlikely that 10^35 would be enough, I'd love to hear more about why -- what are the blockers? Why would OmegaStar, SkunkWorks, etc. described in the p... (read more)

2Vanessa Kosoy6mo
I didn't ask how much, I asked what does it even mean. I think I understand the principles of Cotra's report. What I don't understand is why should we believe the "neural anchor" when (i) modern algorithms applied to a brain-sized ANN might not produce brain-performance and (ii) the compute cost of future algorithms might behave completely differently. (i.e. I don't understand how Carl's and Mark's arguments in this thread protect the neural anchor from Yudkowsky's criticism.)
Misc. questions about EfficientZero

OK, here are my guesses, without seeing anyone else's answers. I think I'm probably wrong, which is why I'm asking this question:

1.a. Underdetermined? It depends on what we mean by the outer objective, and what we mean when we assume it has no inner alignment problems? See e.g. this discussion. That said, yeah it totally seems possible. If the part that predicts reward gets good at generalizing, it should be able to reason/infer/guess that hacking the reward function would yield tons of reward. And then that's what the agent would do.

1.b. Yes? Even though ... (read more)

Load More