Thanks for this, this is awesome! I'm hopeful in the next few years for there to be a collection of stories like this.
This is a story where the alignment problem is somewhat harder than I expect, society handles AI more competently than I expect, and the outcome is worse than I expect. It also involves inner alignment turning out to be a surprisingly small problem. Maybe the story is 10-20th percentile on each of those axes.
I'm a bit surprised that the outcome is worse than you expect, considering that this scenario is "easy mode" for ... (read more)
Not according to this paper! They were able to get performance comparable to full-size networks, it seems. IDK.
I totally agree that you still have to do all the matrix multiplications of the original model etc. etc. I'm saying that you'll need to do them fewer times, because you'll be training on less data.
Each step costs, say, 6*N flop where N is parameter count. And then you do D steps, where D is how many data points you train on. So total flop cost is 6*N*D. When you fine-tune, you still spend 6*N for each data point, but you only need to train on 0.001D data points, at least according to the scaling laws, at least according to the orthodox inter... (read more)
I think compute cost equals data x parameters, so even if parameters are the same, if data is 3 OOM smaller, then compute cost will be 3 OOM smaller.
I'm not sure I understand your edit question. I'm referring to the scaling laws as discussed and interpreted by Ajeya. Perhaps part of what's going on is that in the sizes of model we've explored so far, bigger models only need a little bit more data, because bigger models are more data-efficient. But very soon it is prophecied that this will stop and we will transition to a slower scaling ... (read more)
1. I concede that we're not in a position of complete ignorance w.r.t. the new evidence's impact on alternate hypotheses. However, the same goes for pretty much any argument anyone could make about anything. In my particular case I think there's some sense in which, plausibly, for most underlying views on timelines people will have, my post should cause an update more or less along the lines I described. (see below)
2. Even if I'm wrong about that, I can roll out the anti-spikiness argument to argue in favor of <7 OOMs, tho... (read more)
Thanks! Your answer no. 2 is especially convincing to me; I didn't realize the authors used smaller models as the comparison--that seems like an unfair comparison! I would like to see how well these 0.1%-tuned transformers do compared to similarly-sized transformers trained from scratch.
I think I'm just not seeing why you think the >12 OOM mass must all go somewhere than the <4 OOM (or really, I would argue, <7 OOM) case. Can you explain more?
Maybe the idea is something like: There are two underlying variables, 'We'll soon get more ideas' and 'current methods scale.' If we get new ideas soon, then <7 are needed. If we don't but 'current methods scale' is true, 7-12 are needed. If neither variable is true then >12 is needed. So then we read my +12 OOMs post and become convinced th... (read more)
Hmmm, if this is the most it's been done, then that counts as a No in my book. I was thinking something like "Ah yes, the Viet Cong did this for most of the war, and it's now standard in both the Vietnamese and Chinese armies." Or at least "Some military somewhere has officially decided that this is a good idea and they've rolled it out across a large portion of their force."
In the 1-2-3 coin case, seeing that y is heads rules out 3, but it also rules out half of 1. (There are two 1 hypotheses, the yheads and the ytails version) To put it another way, terms P(yheads|1)=0.5. So we are ruling-out-and-renormalizing after all, even though it may not appear that way at first glance.
The question is, is something similar happening with the AI OOMs?
I think if the evidence leads us to think things like "This doesn't say anything about TAI at +4 OOM, since my prediction is based on orthogonal variables" ... (read more)
Thanks, this is a great thing to be thinking about and a good list of ideas!
Do other subjects come to mind?
Public speaking skills, persuasion skills, debate skills, etc.
Practice no-cost-too-large productive periods
I like this idea. At AI Impacts we were discussing something similar: having "fire drills" where we spend a week (or even just a day) pretending that a certain scenario has happened, e.g. "DeepMind just announced they have a turing-test-passing system and will demo it a week from now; we've got two journalists asking us fo... (read more)
this feels like a situation where our naive intuitions about power are just wrong, and if you think about it more, the formal result reflects a meaningful phenomenon.
Different strokes for different folks, I guess. It feels very different to me.
We now need to reassign most of the 30% mass we have on >13 OOM, but we can't simply renormalise: we haven't (necessarily) gained any information on the viability of [approach X].Our post-update [TAI <= 5OOM] credence should remain almost exactly 20%. Increasing it to ~26% would not make any sense.
I don't see why this is. From a bayesian perspective, alternative hypotheses being ruled out == gaining evidence for a hypothesis. In what sense have we not gained any information on the viability of approach X? We've learned that one of the alternatives to X (the at least 13 OOM alternative) won't happen.
It initially seems unintuitive that as players' strategies improve, their collective Power tends to decrease. The proximate cause of this effect is something like "as your strategy improves, other players lose the power to capitalize off of your mistakes".
"I disagree. The whole point of a zero-sum game (or even constant sum game) is that not everyone can win. So playing better means quite intuitively that the others can be less sure of accomplishing their own goals."
IMO, the unintuitive and potentially problematic thing is not that... (read more)
I'm not sure, but I think that's not how updating works? If you have a bunch of hypotheses (e.g. "It'll take 1 more OOM," "It'll take 2 more OOMs," etc.) and you learn that some of them are false or unlikely (only 10% chance of it taking more than 12" then you should redistribute the mass over all your remaining hypotheses, preserving their relative strengths. And yes I have the same intuition about analogical arguments too. For example, let's say you overhear me talking about a bridge being built near my h... (read more)
Thanks! Well, I agree that I didn't really do anything in my post to say how the "within 12 OOMs" credence should be distributed. I just said: If you distribute it like Ajeya does except that it totals to 80% instead of 50%, you should have short timelines.
There's a lot I could say about why I think within 6 OOMs should have significant probability mass (in fact, I think it should have about as much mass as the 7-12 OOM range). But for now I'll just say this: If you agree with me re Question Two, and put (say) 80%+ probability mass by +12 OOMs, but you als... (read more)
Thanks for doing this! I'm honored that you chose my post to review and appreciate all the thought you put into this.
I have one big objection: The thing you think this post assumes, this post does not actually assume. In fact, I don't even believe it! In more detail:
The relevance of this work appears to rely mostly on the hypothesis that the +12 OOMs of magnitude of compute and all relevant resources could plausibly be obtained in a short time frame. If not, then the arguments made by Daniel wouldn’t have the consequence of making
I never meant to make a claim "20 years is definitely in the realm of possibility" but rather to make a claim "even if it takes 20 years, that's still not necessarily enough to declare that we're all good".
Ah, OK. We are on the same page then.
Thanks! Yeah, there are plenty of people who think takeoff will take more than a decade--but I guess I'll just say, I'm pretty sure they are all wrong. :) But we should take care to define what the start point of takeoff is. Traditionally it was something like "When the AI itself is doing most of the AI research," but I'm very willing to consider alternate definitions. I certainly agree it might take more than 10 years if we define things in such a way that takeoff has already begun.
Yeah, sorry, when I said "accidents" I
Some nitpicks about your risk model slash ways in which my risk model differs from yours:
1. I think AIs are more likely to be more homogenous on Earth; even in a slow takeoff they might be all rather similar to each other. Partly for the reasons Evan discusses in his post, and partly because of acausal shenanigans. I certainly think that, unfortunately, given all the problems you describe, we should count ourselves lucky if any of the contending AI factions are aligned to our values. I think this is an important research area.
2. I am perhaps more optimisti... (read more)
Great post! I think many of the things you say apply equally well to broader categories of scenario too, e.g. your AGI risk model stuff works (with some modification) for different AGI development models than the one you gave. I'd love to see people spell that out, lest skeptics read this post and reply "but that's not how AGI will be made, therefore this isn't a serious problem."
Assuming slow takeoff (again, fast takeoff is even worse), it seems to me that under these assumptions there would probably be a series of increasingly-w
Thinking about politics may not be a failure mode; my question was whether it feels "extreme and somewhat strange," sorry for not clarifying. Like, suppose for some reason "doesn't think about politics" was on your list of desiderata for the extremely powerful AI you are building. So thinking about politics would in that case be a failure mode. Would it be an extreme and somewhat strange one?
I'd be interested to hear more about the law-breaking stuff -- what is it about some laws that makes AI breaking them unsurprising/normal... (read more)
To make sure I understand: you are saying (a) that our AIs are fairly likely to get significantly more sample-efficient in the near future, and (b) even if they don't, there's plenty of data around.
I think (b) isn't a good response if you think that transformative AI will probably need to be human brain sized and you believe the scaling laws and you think that short-horizon training won't be enough. (Because then we'll need something like 10^30+ FLOP to train TAI, which is plausibly reachable in 20 years but probably not in 10. Tha... (read more)
OK. I found the analogy to insecure software helpful. Followup question: Do you feel the same way about "thinking about politics" or "breaking laws" etc.? Or do you think that those sorts of AI behaviors are less extreme, less strange failure modes?
(I didn't find the "...something has gone extremely wrong in a way that feels preventable" as helpful, because it seems trivial. If you pull the pin on a grenade and then sit on it, something has gone extremely wrong in a way that is totally preventable. If you strap rockets to... (read more)
I really don’t want my AI to strategically deceive me and resist my attempts to correct its behavior. Let’s call an AI that does so egregiously misaligned (for the purpose of this post). ... But it feels to me like egregious misalignment is an extreme and somewhat strange failure mode and it should be possible to avoid it regardless of how the empirical facts shake out.
I'd love to hear more about this. To me, "egregious misalignment" feels extremely natural/normal/expected, perhaps due to convergent instrumental goals. ... (read more)
But it feels to me like egregious misalignment is an extreme and somewhat strange failure mode and it should be possible to avoid it regardless of how the empirical facts shake out.
Paul, this seems a bizarre way to describe something that we agree is the default result of optimizing for almost anything (eg paperclips). Not only do I not understand what you actually did mean by this, it seems like phrasing that potentially leads astray other readers coming in for the first time. Say, if you imagine somebody at Deepmind coming in without a lot of... (read more)
I think I'm responding to a more basic intuition, that if I wrote some code and its now searching over ingenious ways to kill me, then something has gone extremely wrong in a way that feels preventable. It may be the default in some sense, just as wildly insecure software (which would lead to my computer doing the same thing under certain conditions) is the default in some sense, but in both cases I have the intuition that the failure comes from having made an avoidable mistake in designing the software.
In some sense changing this view would change my bott... (read more)
Nice post! I'm interested to hear more about how your methodology differs from others. Does this breakdown seem roughly right?
1. Naive AI alignment: We are satisfied by an alignment scheme that can tell a story about how it works. (This is what I expect to happen in practice at many AI labs.)
2. Typical-Case AI Alignment: We aren't satisfied until we try hard to think of ways our scheme could fail, and still it doesn't seem like failure is the most likely outcome. (This is what I expect the better sort of AI labs, the ones with big well-respected safety tea... (read more)
I don't really think of 3 and 4 as very different, there's definitely a spectrum regarding "plausible" and I think we don't need to draw the line firmly---it's OK if over time your "most plausible" failure mode becomes increasingly implausible and the goal is just to make it obviously completely implausible. I think 5 is a further step (doesn't seem like a different methodology, but a qualitatively further-off stopping point, and the further off you go the more I expect this kind of theoretical research to get replaced by empirical research). I think of it... (read more)
Maybe we won’t restart the inner algorithm from scratch every time we edit it, since it’s so expensive to do so. Instead, maybe once in a while we’ll restart the algorithm from scratch (“re-initialize to random weights” or something analogous), but most of the time, we’ll take whatever data structure holds the AI’s world-knowledge, and preserve it between one version of the inner algorithm and its successor. Doing that is perfectly fine and plausible, but again, the result doesn’t look like evolution;
Incidentally, I think GPT-3 is great evidence that human-legible learning algorithms are up to the task of directly learning and using a common-sense world-model. I’m not saying that GPT-3 is necessarily directly on the path to AGI; instead I’m saying, How can you look at GPT-3 (a simple learning algorithm with a ridiculously simple objective) and then say, “Nope! AGI is way beyond what human-legible learning algorithms can do! We need a totally different path!”?
I think the response would be, "GPT-3 may have learned an aw... (read more)
Note that evolution is not in this picture: its role has been usurped by the engineers who wrote the PyTorch code. This is intelligent design, not evolution!
IMO you should put evolution in the picture, as another part of the analogy! :) Make a new row at the top, with "Genomes evolving over millions of generations on a planet, as organisms with better combinations of genes outcompete others" on the left and "Code libraries evolving over thousands of days in an industry, as software/ANN's with better code outcompete (in the economy, in the academic prestige competition, in the minds of individual researchers) others" on the right. (Or some shortened version of that)
Thanks! Well, I for one am feeling myself get nerd-sniped by this agenda. I'm resisting so far (so much else to do! Besides, this isn't my comparative advantage) but I'll definitely be reading your posts going forward and if you ever want to bounce ideas off me in a call I'd be down. :)
To be meaningful, this requires whole-process feedback: we need to judge thoughts by their entire chain of origination. (This is technically challenging, because the easiest way to implement process-level feedback is to create a separate meta-level which oversees the rest of the system; but then this meta-level would not itself be subject to oversight.)
I'd be interested to hear this elaborated further. It seems to me to be technically challenging but not very; it feels like the sort of thing that we could probably solve with a couple people working ... (read more)
Update: According to this the human brain actually is getting ~10^7 bits of data every second, although the highest level conscious awareness is only processing ~50. So insofar as we go with the "tokens" definition, it does seem that the human brain is processing plenty of tokens for its parameter count -- 10^16, in fact, over the course of its lifetime. More than enough! And insofar as we go with the "single pass through the network" definition, which would mean we are looking for about 10^12... then we get a small discrepancy; the max... (read more)
Makes sense. I think we don't disagree dramatically then.
I also think TAI is a less important category for me than x-risk inducing AI.
Also makes sense -- just checking, does x-risk-inducing AI roughly match the concept of "AI-induced potential point of no return" or is it importantly different? It's certainly less of a mouthful so if it means roughly the same thing maybe I'll switch terms. :)
When you say academia looks like a clear win within 5-10 years, is that assuming "academia" means "starting a tenure-track job now?" If instead one is considering whether to begin a PhD program, for example, would you say that the clear win range is more like 10-15 years?
Also, how important is being at a top-20 institution? If the tenure track offer was instead from University of Nowhere, would you change your recommendation and say go to industry?
Would you agree that if the industry project you could work on is the one that will eventually build TAI (or be one of the leading builders, if there are multiple) then you have more influence from inside than from outside in academia?
I'm more interested in feedback on the +12 OOMs one because it's more decision-relevant. It's more of a fuzzy thing, not crunchy logic like the first one I recommended, and therefore less suitable for your purposes (or so I thought when I first answered your question, now I am not sure)
Insofar as you want to do others of mine, my top recommendation would be this one since it got less feedback than I expected and is my most important timelines-related post of all time IMO.
This list of benefit logically pushed multiple people to argue that we should make AI Alignment paradigmatic.
Who? It would be helpful to have some links so I can go read what they said.
I disagree. Or to be more accurate, I agree that we should have paradigms in the field, but I think that they should be part of a bigger epistemological structure. Indeed, a naive search for a paradigm either results in a natural science-like paradigm, that put too little emphasis on applications and usefulness, or in a premature constraint on the problem we’re try
Welp, this scoops a bunch of the stuff in my "Why acausal trade matters" chapter. :D Nice!
The DDT idea amuses me. I guess it's maybe the best shot we have, but boy do I get a sense of doom when I imagine that the fate of the world depends on our ability to control/steer/oversee AIs as they become more capable than us in many important ways via keeping them dumb in various other important ways. I guess there's that thing the crocodile wrestlers do where you hold their mouth shut since their muscles for opening are much weaker than their ... (read more)
One way of looking at DDT is "keeping it dumb in various ways." I think another way of thinking about is just designing a different sort of agent, which is "dumb" according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they're just sacrificing utility for no reason.
There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.
I agree that the maximum a posterior world doesn't help that much, but I think there is some sense in which "having uncertainty" might be undesirable.
OK, fair enough.
I disagree with Eliezer here:
Chance of discovering or verifying long-term solution(s): I’m not sure whether a “one shot” solution to alignment (that is, a single relatively “clean” algorithm which will work at all scales including for highly superintelligent models) is possible. But if it is, it seems like starting to do a lot of work on aligning narrowly superhuman models probably allows us to discover the right solution sooner than we otherwise would have.
Eliezer Yudkowsky: It's not possible. Not for us, any
I think I agree with Eliezer here, but I'm worried I misunderstand something:
Eliezer Yudkowsky: "Pessimal" is a strange word to use for this apt description of humanity's entire experience with ML to date. Unless by "generalize" you mean "generalize correctly to one new example from the same distribution" rather than "generalize the underlying concept that a human would".
Ajeya Cotra: I used "pessimal" here in the technical sense that it's assuming if there are N generalizations equally
Hmmm, it seems we aren't on the same page. (The argument sketch you just made sounds to me like a collection of claims which are either true but irrelevant, or false, depending on how I interpret them.) I'll go back and reread Ajeya's report (or maybe talk to her?) and then maybe we'll be able to get to the bottom of this. Maybe my birds/brains/etc. post directly contradicts something in her report after all.
(Btw, I have similar feelings about the non-Neuromorph answers too; but "idk I'm not really compelled by this" didn't seem like a particularly constructive comment.)
On the contrary, I've been very (80%?) surprised by the responses so far -- in the Elicit poll, everyone agrees with me! I expected there to be a bunch of people with answers like "10%" and "20%" and then an even larger bunch of people with answers like "50%" (that's what I expected you, Ajeya, etc. to chime in and say). Instead, wel... (read more)
At this point I guess I just say I haven't looked into the worm literature enough to say. I can't tell from the post alone whether we've neuromorphed the worm yet or not.
"Qualitatively as impressive as a worm" is a pretty low bar, I think. We have plenty of artificial neural nets that are much more impressive than worms already, so I guess the question is whether we can make one with only 302 neurons that is as impressive as a worm... e.g. can it wriggle in a way that moves it around, can it move away from sources of damage and to... (read more)
Good question! Here's my answer:
--I think Neuromorph has the least chance of succeeding of the five. Still more than 50% though IMO. I'm not at all confident in this.
--Neuromorph =/= an attempt to create uploads. I would be extremely surprised if the resulting AI was recognizeably the same person as was scanned. I'd be mildly surprised if it even seemed human-like at all, and this is conditional on the project working. What I imagine happening conditional on the project working is something like: After a few generations of selection, we get ... (read more)
 AlphaStar was 10^8 parameters, ten times smaller than a honeybee brain. I think this puts its capabilities in perspective. Yes, it seemed to be more of a heuristic-executor than a long-term planner, because it could occasionally be tricked into doing stupid things repeatedly. But the same is true for insects.
 This is definitely true for Transformers (and LSTMs I think?), but it may not be true for whatever architecture AlphaStar uses. In particular some people I talked to worry that the vanishing gradients problem might make bigger RL models like OmegaStar actually worse. However, everyone I talked to agreed with the “probably”-qualified version of this claim. I’m very interested to learn more about this.
 To avoid catastrophic forgetting, let’s train OmegaStar on all these different games simultaneously, e.g. it plays game A for a short period, then plays game B, then C, etc. and loops back to game A only much later.
 Lukas Finnveden points out that Gwern’s extrapolation is pretty weird. Quoting Lukas: “Gwern takes GPT-3's current performance on lambada; assumes that the loss will fall as fast as it does on "predict-the-next-word" (despite the fact that the lambada loss is currently falling much faster!) and extrapolates current performance (without adjusting for the expected change in scaling law after the crossover point) until the point where the AI is as good as humans (and btw we don't have a source for the stated human performance)
I'd endorse a summary more li... (read more)
 One might worry that the original paper had a biased sample of tasks. I do in fact worry about this. However, this paper tests GPT-3 on a sample of actual standardized tests used for admission to colleges, grad schools, etc. and GPT-3 exhibits similar performance (around 50% correct), and also shows radical improvement over smaller versions of GPT.