Recommended Sequences

Embedded Agency
AGI safety from first principles
Iterated Amplification

Recent Discussion

I'm worried that many AI alignment researchers and other LWers have a view of how human morality works, that really only applies to a small fraction of all humans (notably moral philosophers and themselves). In this view, people know or at least suspect that they are confused about morality, and are eager or willing to apply reason and deliberation to find out what their real values are, or to correct their moral beliefs. Here's an example of someone who fits this view:

I’ve written, in the past, about a “ghost” version of myself — that is, one that can float free from my body; which travel anywhere in all space and time, with unlimited time, energy, and patience; and which can also make changes to different variables, and

...

I'm leaning towards the more ambitious version of the project of AI alignment being about corrigible anti-goodharting, with the AI optimizing towards good trajectories within scope of relatively well-understood values, preventing overoptimized weird/controversial situations, even at the cost of astronomical waste. Absence of x-risks, including AI risks, is generally good. Within this environment, the civilization might be able to eventually work out more about values, expanding the scope of their definition and allowing stronger optimization. Corrigibility is then about continually picking up the values and their implied scope from the predictions of how they would've been worked out some time in the future.

1romeostevensit3hYou may not be interested in mutually exclusive compression schemas, but mutually exclusive compression schemas are interested in you. One nice thing is that given that the schemas use an arbitrary key to handshake with there is hope that they can be convinced to all get on the same arbitrary key without loss of useful structure.

This is a transcription of Eliezer Yudkowsky responding to Paul Christiano's Takeoff Speeds live on Sep. 14, followed by a conversation between Eliezer and Paul. This discussion took place after Eliezer's conversation with Richard Ngo.

 

Color key:

 Chat by Paul and Eliezer  Other chat  Inline comments 

 

5.5. Comments on "Takeoff Speeds"

 

[Yudkowsky][10:14]  (Nov. 22 follow-up comment) 

(This was in response to an earlier request by Richard Ngo that I respond to Paul on Takeoff Speeds.)

[Yudkowsky][16:52] 

maybe I'll try liveblogging some https://sideways-view.com/2018/02/24/takeoff-speeds/ here in the meanwhile

 

Slower takeoff means faster progress

[Yudkowsky][16:57] 


The main disagreement is not about what will happen once we have a superintelligent AI, it’s about what will happen before we have a superintelligent AI. So slow takeoff seems to mean that AI has a larger impact on the world, sooner.

It seems to me to be

...
3Lukas Finnveden6dNitpick: I think that particular analogy isn't great. For nuclear stuff, we have two state variables: amount of fissile material and current number of neutrons flying around. The amount of fissile material determines the "neutron multiplication factor", but it is the number of neutrons that goes crazy, not fissile material. And the current number of neurons doesn't matter for whether the pile will eventually go crazy or not. But in the simplest toy models of RSI, we just have one variable: intelligence. We can't change the "intelligence multiplication factor", there's just intelligence figuring out how to build more intelligence. Maybe exothermic chemical reactions, like fire, is a better analogy. Either you have enough heat to create a self-sustaining reaction, or you don't.
7Paul Christiano6dI'm going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper [https://arxiv.org/pdf/2009.03393.pdf]. Big unknowns are then (i) how many orders of magnitude of "low-hanging fruit" are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is "they are pretty easy" / "humans are pretty bad at these tasks," but it's somewhat harder to quantify. If you think your methodology is different from that then we will probably end up disagreeing. Looking towards more ambitious benchmarks, I think that the IMO grand challenge [https://imo-grand-challenge.github.io/] is currently significantly more than 5 years away. In 5 year's time my median guess (without almost any thinking about it) is that automated solvers can do 10% of non-geometry, non-3-variable-inequality IMO shortlist problems. So yeah, I'm happy to play ball in this area, and I expect my predictions to be somewhat more right than yours after the dust settles. Is there some way of measuring such that you are willing to state any prediction? (I still feel like I'm basically looking for any predictions at all beyond sometimes saying "my model wouldn't be surprised by <vague thing X>", whereas I'm pretty constantly throwing out made-up guesses which I'm happy to refine with more effort. Obviously I'm going to look worse in retrospect than you if we keep up this way though, that particular asymmetry is a lot of the reason people mostly don't play ball. ETA: that's a bit unfair, the romantic chatbot vs self-driving car prediction is one where we've both given off-the-cuff takes.)
8Eliezer Yudkowsky6dI have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read. I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024 - though of course, as events like this lie in the Future, they are very hard to predict. Can you say more about why or whether you would, in this case, say that this was an un-Paulian set of events? As I have trouble manipulating my Paul model, it does not exclude Paul saying, "Ah, yes, well, they were using 700M models in that paper, so if you jump to 70B, of course the IMO grand challenge could fall; there wasn't a lot of money there." Though I haven't even glanced at any metrics here, let alone metrics that the IMO grand challenge could be plotted on, so if smooth metrics rule out IMO in 5yrs, I am more interested yet - it legit decrements my belief, but not nearly as much as I imagine it would decrement yours. (Edit: Also, on the meta-level, is this, like, anywhere at all near the sort of thing you were hoping to hear from me? Am I now being a better epistemic citizen, if maybe not a good one by your lights?)
15Paul Christiano6dYes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I'd look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I'm also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don't expect it to get you that far. I don't think this is an easy prediction to extract from a trendline, in significant part because you can't extrapolate trendlines this early that far out. So this is stress-testing different parts of my model, which is fine by me. At the meta-level, this is the kind of thing I'm looking for, though I'd prefer have some kind of quantitative measure of how not-surprised you are. If you are only saying 2% then we probably want to talk about things less far in your tails than the IMO challenge.
14Eliezer Yudkowsky6dOkay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025. The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here. So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse? Pretty bare portfolio but it's at least a start in both directions. If we say 5% instead of 1%, how much further would you extend the time limit out beyond 2024? I also don't know at all what part of your model forbids theorem-proving to fall in a shocking headline followed by another headline a year later - it doesn't sound like it's from looking at a graph - and I think that explaining reasons behind our predictions in advance, not just making quantitative predictions in advance, will help others a lot here. EDIT: Though the formal IMO challenge has a barnacle about the AI being open-sourced, which is a separate sociological prediction I'm not taking on.
18Paul Christiano6dI think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails. I'd say <4% on end of 2025. I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are so surprising. If I'm at 4% and you are 12% and we had 8 such bets, then I can get a factor of 2 if they all come out my way, and you get a factor of ~1.5 if one of them comes out your way. I might think more about this and get a more coherent probability distribution, but unless I say something else by end of 2021 you can consider 4% on end of 2025 this my prediction.
20Eliezer Yudkowsky5dMaybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend? Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general? Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime? Added: This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics would not consider you one of their own) where they saunter around saying "X will not occur in the next 5 / 10 / 20 years" and they're often right for the next couple of years, because there's only one year where X shows up for any particular definition of that, and most years are not that year; but also they're saying exactly the same thing up until 2 years before X shows up, if there's any early warning on X at all. It seems to me that 2 years is about as far as Nope Vision extends in real life, for any case that isn't completely slam-dunk; when I called upon those gathered AI luminaries to say the least impressive thing that definitely couldn't be done in 2 years, and they all fell silent, and then a single one of them named Winograd schemas, they were right that Winograd schemas at the stated level didn't fall within 2 years, but very barely so (they fell the year after). So part of what I'm flailingly asking here, is whether you think you have reliable and sensitive Nope Vision that extends out beyond 2 years, in general, such that you can go on saying "Not for 4 years" up until we are actually within 6 years of the thing, and then, you think, your Nope Vision will actually flash an alert and you will change your tune, before you are actually within 4 years of the thing. Or maybe you think you've got Nope Vision extending out 6 years? 10 years? Or maybe theorem-proving is just a special case and usually your Nope Vision would be
8Paul Christiano4dI think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M -> $1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orders of magnitude, and there is likely lots of low-hanging fruit to pick, and we just don't have much to extrapolate from (compared to more mature technologies, or how I expect AI will be shortly before the end of days), and for similar reasons there aren't really any benchmarks to extrapolate. (Also note that it matters a lot whether you know what problems labs will try to take a stab at. For the purpose of all of these forecasts, I am trying insofar as possible to set aside all knowledge about what labs are planning to do though that's obviously not incentive-compatible and there's no particular reason you should trust me to do that.)
8Matthew Barnett6dPossibly helpful: Metaculus currently [https://www.metaculus.com/questions/6728/ai-wins-imo-gold-medal/] puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039. I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.
4Paul Christiano3hI think Metaculus is closer to Eliezer here: conditioned on this problem being resolved it seems unlikely for the AI to be either open-sourced or easily reproducible.

My honest guess is that most predictors didn’t see that condition and the distribution would shift right if someone pointed that out in the comments.

9Eliezer Yudkowsky6dHa! Okay then. My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul? EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available; I frame no separate sociological prediction about whether somebody is willing to open-source the AI model that does it.
10Paul Christiano3dI don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting. Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and modest effort). Together with modest progress in deep learning based methods, and a somewhat serious effort, I wouldn't be surprised by people getting up to 20-40% of problems. The bronze cutoff is usually 3/6 problems, and the gold cutoff is usually 5/6 (assuming the AI doesn't get partial credit). The difficulty of problems also increases very rapidly for humans---there are often 3 problems that a human can do more-or-less mechanically. I could tighten any of these estimates by looking at the distribution more carefully rather than going off of my recollections from 2008, and if this was going to be one of a handful of things we'd bet about I'd probably spend a few hours doing that and some other basic digging.
7Paul Christiano3hI looked at a few recent IMOs to get better calibrated. Might be good to make some side bets: * Conditioned on winning I think it's only maybe 20-30% probability to get all 6 problems (whereas I think you might have a higher probability on jumping right past human level, or at least have 50% on 6 vs 5?). * Conditioned on a model getting a silver I feel like we have a pretty good guess about what algorithm will be SOTA on this problem (e.g. I'd give 50% to a pretty narrow class of algorithms with some uncertain bells and whistles, with no inside knowledge). Whereas I'd guess you have a much broader distribution. But more useful to get other categories of bets. (Maybe in programming, investment in AI, economic impact from robotics, economic impact from chatbots, translation?)
2Matthew Barnett6dIf this task is bad for operationalization reasons, there are other theorem proving benchmarks [https://paperswithcode.com/task/automated-theorem-proving]. Unfortunately it looks like there aren't a lot of people that are currently trying to improve on the known benchmarks, as far as I'm aware. The code generation benchmarks [https://paperswithcode.com/task/code-generation] are slightly more active. I'm personally partial to Hendrycks et al.'s APPS benchmark [https://arxiv.org/pdf/2105.09938v3.pdf], which includes problems that "range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability." (Github link [https://github.com/hendrycks/apps]).
4Matthew Barnett6dIt feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equivalent of a Gold medal). The relevant question is which rank/percentile you think is likely to be attained by 2025 under your model but you predict would be implausible under Paul's model. This may be a daunting task, but one way to get started is to put a probability distribution over what you think the state-of-the-art will look like by 2025, and then compare to Paul's. Edit: Here are, for example, the individual rankings for 2021: https://www.imo-official.org/year_individual_r.aspx?year=2021 [https://www.imo-official.org/year_individual_r.aspx?year=2021]
3Rob Bensinger6dMy model of Eliezer may be wrong, but I'd guess that this isn't a domain where he has many over-50% predictions of novel events at all? See also 'I don't necessarily expect self-driving cars before the apocalypse'. My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like 'maybe we'll make insane progress on theorem-proving (or whatever) out of the blue'. Again, I may be wrong, but my intuition is that you're Paul-omorphizing Eliezer when you assume that >16% probability of huge progress in X by year Y implies >50% probability of smaller-but-meaningful progress in X by year Y.
1Rob Bensinger6d(Ah, EY already replied.)
7Eliezer Yudkowsky6dI expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%. Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict. Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier. I frame no prediction about whether Paul is under 16%. That's a separate matter. I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.
13Paul Christiano7dThis seems totally bogus to me. It feels to me like you mostly don't have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I'm saying I'm totally happy to use gradualist metrics to predict any of those things. If you want to say "what does it mean to be a gradualist" I can just give you predictions on them. To you this seems reasonable, because e.g. $ and benchmarks are not the right way to measure the kinds of impacts we care about. That's fine, you can propose something other than $ or measurable benchmarks. If you can't propose anything, I'm skeptical. My basic guess is that you probably can't effectively predict $ or benchmarks or anything else quantitative. If you actually agreed with me on all that stuff, then I might suspect that you are equivocating between a gradualist-like view that you use for making predictions about everything near term and then switching to a more bizarre perspective when talking about the future. But fortunately I think this is more straightforward, because you are basically being honest when you say that you don't understand how the gradualist perspective makes predictions.
13Eliezer Yudkowsky6dI kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me). But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life." We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years. I think there's a very real sense in which, yes, what we're interested in are milestones, and often milestones that aren't easy to define even after the fact. GPT-2 was shocking, and then GPT-3 carried that shock further in that direction, but how do you talk with that about somebody who thinks that perplexity loss is smooth? I can handwave statements like "GPT-3 started to be useful without retraining via just prompt engineering" but qualitative statements like those aren't good for betting, and it's much much harder to come up with the right milestone like that in advance, instead of looking back in your rearview mirror afterwards. But you say - I think? - that you were less shocked by this sort of thing than I am. So, I mean, can you prophesy to us about milestones and headlines in the next five years? I think I kept thinking this during our dialogue, but never saying it, because it seemed like such an unfair demand to make! But it's also part of the whole point that AGI and superintelligence and the world ending are all qualitative milestones like that. Whereas such trend points as Moravec was readily able to forecast correctly - like 10 teraops / plausibly-human-equivalent-computation being available in a $10 million supercomputer around 2010 - are really entirely unanchored from AGI, at least relative to our current knowledge about AGI. (They would be anchored if we'd seen other planets go through this, but we haven't.)
5Matthew Barnett6dDon't you think you're making a falsifiable prediction here? Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.
3Eliezer Yudkowsky6d(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)
4Paul Christiano7dMan, the problem is that you say the "jump to newly accessible domains" will be the thing that lets you take over the world. So what's up for dispute is the prototype being enough to take over the world rather than years of progress by a giant lab on top of the prototype. It doesn't help if you say "I expect new things to sometimes become possible" if you don't further say something about the impact of the very early versions of the product. If e.g. people were spending $1B/year developing a technology, and then after a while it jumps from 0/year to $1B/year of profit, I'm not that surprised. (Note that machine translation is radically smaller than this, I don't know the numbers.) I do suspect they could have rolled out a crappy version earlier, perhaps by significantly changing their project. But why would they necessarily bother doing that? For me this isn't violating any of the principles that make your stories sound so crazy. The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world). (Note: it is surprising if an industry is spending $10T/year on R&D and then jumps from $1T --> $10T of revenue in one year in a world that isn't yet growing crazily. The surprising depends a lot on the numbers involved, and in particular on how valuable it would have been to deploy a worse version earlier and how hard it is to raise money at different scales.)
5Eliezer Yudkowsky7dWould you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids?
4Paul Christiano7dIt's not a description of hominids at all, no one spent any money on R&D. I think there are analogies where this would be analogous to hominids (which I think are silly, as we discuss in the next part of this transcript). And there are analogies where this is a bad description of hominids (which I prefer).
6Adele Lopez7dSpending money on R&D is essentially the expenditure of resources in order to explore and optimize over a promising design space, right? That seems like a good description of what natural selection did in the case of hominids. I imagine this still sounds silly to you, but I'm not sure why. My guess is that you think natural selection isn't relevantly similar because it didn't deliberately plan to allocate resources as part of a long bet that it would pay off big.
3Paul Christiano7dI think natural selection has lots of similarities to R&D, but (i) there are lots of ways of drawing the analogy, (ii) some important features of R&D are missing in evolution, including some really important ones for fast takeoff arguments (like the existence of actors who think ahead). If someones wants to spell out why they think evolution of hominids means takeoff is fast then I'm usually happy to explain why I disagree with their particular analogy. I think this happens in the next discord log between me and Eliezer.
6Paul Christiano7dI'd be happy to disagree about romantic chatbots or machine translation. I'd have to look into it more to get a detailed sense in either, but I can guess. I'm not sure what "wouldn't be especially surprised" means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you'd also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we're missing and we can revise in light of that, but we'd still need more resolution. (That said: what's up for grabs here are predictions about the future, not present.) I'd guess that machine translation is currently something like $100M/year in value, and will scale up more like 2x/year than 10x/year as DL improves (e.g. most of the total log increase will be in years with <3x increase rather than >3x increase, and 3 is like the 60th percentile of the number for which that inequality is tight). I'd guess that increasing deployment of romantic chatbots will end up with technical change happening first followed by social change second, so the speed of deployment and change will depend on the speed of social change. At early stages of the social change you will likely see much large investment in fine-tuning for this use case, and the results will be impressive as you shift from random folks doing it to actual serious efforts. The fact that it's driven by social rather than technical change means it could proceed at very different paces in different countries. I don't expect anyone to make a lot of profit from this before self-driving cars, for example I'd be pretty surprised if this surpassed $1B/year of revenue before self-driving cars passed $10B/year of revenue. I have no idea what's happening in China. It would be fairly surprising to me if there was currently an actually-compelling version of the technology---which we could try operationalize as something like how ba
5Eliezer Yudkowsky7dThanks for continuing to try on this! Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get $1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial $10B/yr. I also remark that I wouldn't be surprised to hear that waifutech is already past $1B/yr in China, but I haven't looked into things there. I don't expect the waifutech to transcend my own standards for mediocrity, but something has to be pretty good before I call it more than mediocre; do you think there's particular things that waifutech won't be able to do? My model permits large jumps in ML translation adoption; it is much less clear about whether anyone will be able to build a market moat and charge big prices for it. Do you have a similar intuition about # of users increasing gradually, not just revenue increasing gradually? I think we're still at the level of just drawing images about the future, so that anybody who came back in 5 years could try to figure out who sounded right, at all, rather than assembling a decent portfolio of bets; but I also think that just having images versus no images is a lot of progress.
3Paul Christiano7dYes, I think that value added by automated translation will follow a similar pattern. Number of words translated is more sensitive to how you count and random nonsense, as is number of "users" which has even more definitional issues. You can state a prediction about self-driving cars in any way you want. The obvious thing is to talk about programs similar to the existing self-driving taxi pilots (e.g. Waymo One) and ask when they do $X of revenue per year, or when $X of self-driving trucking is done per year. (I don't know what AI freedom-of-the-road means, do you mean something significantly more ambitious than self-driving trucks or taxis?)
4Daniel Kokotajlo7dWow, I did not expect those results!
7Ramana Kumar7dI wonder what effect there is from selecting for reading the third post in a sequence of MIRI conversations from start to end and also looking at the comments and clicking links in them.

This post is a transcript of a discussion between Paul Christiano, Ajeya Cotra, and Eliezer Yudkowsky on AGI forecasting, following up on Paul and Eliezer's "Takeoff Speeds" discussion.

 

Color key:

 Chat by Paul and Eliezer  Chat by Ajeya  Inline comments 

 

8. September 20 conversation

 

8.1. Chess and Evergrande

 

[Christiano][15:28] 

 I still feel like you are overestimating how big a jump alphago is, or something. Do you have a mental prediction of how the graph of (chess engine quality) vs (time) looks, and whether neural net value functions are a noticeable jump in that graph?

Like, people investing in "Better Software" doesn't predict that you won't be able to make progress at playing go. The reason you can make a lot of progress at go is that there was extremely little investment in playing better go.

So then

...
4Vanessa Kosoy15hChristiano predicts progress will be (approximately) a smooth curve, whereas Yudkowsky predicts there will be discontinuous-ish "jumps", but there's another thing that can happen that both of them seem to dismiss: progress hitting a major obstacle and plateauing for a while (i.e. the progress curve looking locally like a sigmoid). I guess that the reason they dismiss it is related to this quote [https://www.alignmentforum.org/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions] by Soares: However, I think this is not entirely accurate. Some games are still unsolved without "cheating", where by cheating I mean using human demonstrations or handcrafted rewards, and that includes Montezuma's Revenge, StarCraft II and Dota 2 (and Dota 2 with unlimited hero selection is even more unsolved). Moreover, we haven't seen RL show superhuman performance on any task in which the environment is substantially more complex than the agent in important ways (this rules out all video games, unless if winning the game requires a good theory of mind of your opponents[1] [#fn-X3oqj8PNPxWoEvfF5-1], which is arguably never the case for zero-sum two-player games). Language models made impressive progress, but I don't think they are superhuman along any interesting dimension. Classifiers still struggle with adversarial examples (although, this is not necessarily an important limitation, maybe humans have "adversarial examples" too). So, it is certainly possible that it's a "clear runway" from here to superintelligence. But I don't think it's obvious. -------------------------------------------------------------------------------- 1. I know there are strong poker AIs, but I suspect they win via something other than theory of mind. Maybe someone who knows the topic can comment. ↩︎ [#fnref-X3oqj8PNPxWoEvfF5-1]
1Rob Bensinger14hMy Eliezer-model is a lot less surprised by lulls than my Paul-model (because we're missing key insights for AGI, progress on insights is jumpy and hard to predict, the future is generally very unpredictable, etc.). I don't know exactly how large of a lull or winter would start to surprise Eliezer (or how much that surprise would change if the lull is occurring two years from now, vs. ten years from now, for example). In Yudkowsky and Christiano Discuss "Takeoff Speeds" [https://www.lesswrong.com/posts/vwLxd6hhFvPbvKmBH/yudkowsky-and-christiano-discuss-takeoff-speeds] , Eliezer says: So in that sense Eliezer thinks we're already in a slowdown to some degree (as of 2020), though I gather you're talking about a much larger and more long-lasting slowdown.

I generally expect smoother progress, but predictions about lulls are probably dominated by Eliezer's shorter timelines. Also lulls are generally easier than spurts, e.g. I think that if you just slow investment growth you get a lull and that's not too unlikely (whereas part of why it's hard to get a spurt is that investment rises to levels where you can't rapidly grow it further).

1Vanessa Kosoy13hMakes some sense, but Yudkowsky's prediction that TAI will arrive before AI has large economic impact does forbid a lot of plateau scenarios. Given a plateau that's sufficiently high and sufficiently long, AI will land in the market, I think. Even if regulatory hurdles are the bottleneck for a lot of things atm, eventually in some country AI will become important and the others will have to follow or fall behind.
13Rob Bensinger2dFound two Eliezer-posts from 2016 (on Facebook) that I feel helped me better grok his perspective. Sep. 14, 2016 [https://www.facebook.com/yudkowsky/posts/10154575811294228]: And earlier, Jan. 27, 2016 [https://www.facebook.com/yudkowsky/posts/10153914357214228]:
5Rob Bensinger4dTranscript error fixed -- the line that previously read [Yudkowsky][17:40] I expect it to go away before the end of days but with there having been a big architectural innovation, not Stack More Layers [Christiano][17:40] I expect it to go away before the end of days but with there having been a big architectural innovation, not Stack More Layers [Yudkowsky][17:40] if you name 5 possible architectural innovations I can call them small or large should be [Yudkowsky][17:40] I expect it to go away before the end of days but with there having been a big architectural innovation, not Stack More Layers [Christiano][17:40] yeah whereas I expect layer stacking + maybe changing loss (since logprob is too noisy) is sufficient [Yudkowsky][17:40] if you name 5 possible architectural innovations I can call them small or large
4Paul Christiano6d(ETA: this wasn't actually in this log but in a future part of the discussion.) I found the elephants part of this discussion surprising. It looks to me like human brains are better than elephant brains at most things, and it's interesting to me that Eliezer thought otherwise. This is one of the main places where I couldn't predict what he would say.
5Eliezer Yudkowsky6dI also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?
2Paul Christiano6dOops, this was in reference to the later part of the discussion where you disagreed with "a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal [without using tools]".
12Lukas Finnveden6dIf Eliezer endorses this on reflection, that would seem to suggest that Paul actually has good models about how often trend breaks happen, and that the problem-by-Eliezer's-lights is relatively more about, either: * that Paul's long-term predictions do not adequately take into account his good sense of short-term trend breaks. * that Paul's long-term predictions are actually fine and good, but that his communication about it is somehow misleading to EAs. That would be a very different kind of disagreement than I thought this was about. (Though actually kind-of consistent with the way that Eliezer previously didn't quite diss Paul's track-record, but instead dissed "the sort of person who is taken in by this essay [is the same sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2]"?) Also, none of this erases the value of putting forward the predictions mentioned in the original quote, since that would then be a good method of communicating Paul's (supposedly miscommunicated) views.
12johnswentworth7dSome thinking-out-loud on how I'd go about looking for testable/bettable prediction differences here... I think my models overlap mostly with Eliezer's in the relevant places, so I'll use my own models as a proxy for his, and think about how to find testable/bettable predictions with Paul (or Ajeya, or someone else in their cluster). One historical example immediately springs to mind where something-I'd-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve [https://en.wikipedia.org/wiki/Phillips_curve]. The original Philips curve was based on just fitting a curve to inflation-vs-unemployment data; Friedman and Phelps both independently came up with theoretical models for that relationship in the late sixties ('67-'68), and Friedman correctly forecasted that the curve would break down in the next recession (i.e. the "stagflation" of '73-'75). This all led up to the Lucas Critique [https://en.wikipedia.org/wiki/Lucas_critique], which I'd consider the canonical case-against-what-I'd-call-Paul-esque-worldviews within economics. The main idea which seems transportable to other contexts is that surface relations (like the Philips curve) break down under distribution shifts in the underlying factors. So, how would I look for something analogous to that situation in today's AI? We need something with an established trend, but where a distribution shift happens in some underlying factor. One possible place to look: I've heard that OpenAI plans to make the next generation of GPT not actually much bigger than the previous generation; they're trying to achieve improvement through strategies other than Stack More Layers. Assuming that's true, it seems like a naive Paul-esque model would predict that the next GPT is relatively unimpressive compared to e.g. the GPT2 -> GPT 3 delta? Whereas my models (or I'd guess Eliezer's models) would predict that it's relatively more impressive, compared to the expectations of Paul-esque models (derived
8Rohin Shah2dThe "continuous view" as I understand it doesn't predict that all straight lines always stay straight. My version of it (which may or may not be Paul's version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight). In its application to AI, this is combined with a prediction that people will in fact be putting in lots of effort into making AI systems intelligent / powerful / able to automate AI R&D / etc, before AI has reached a point where it can execute a pivotal act. This second prediction comes for totally different reasons, like "look at what AI researchers are already trying to do" combined with "it doesn't seem like AI is anywhere near the point of executing a pivotal act yet". (I think on Paul's view the second prediction is also bolstered by observing that most industries / things that had big economic impacts also seemed to have crappier predecessors. This feels intuitive to me but is not something I've checked and so isn't my personal main reason for believing the second prediction.) I'm not very familiar with this (I've only seen your discussion and the discussion in IEM) but it does not seem like the sort of thing where the argument I laid out above would have had a strong opinion. Was the y-axis of the straight line graph a metric that people were trying to optimize? If so, did the change in policy not represent a change in the amount of effort put into optimizing the metric? (I haven't looked at the details here, maybe the answer is yes to both, in which case I would be interested in looking at the details.) This seems plausible but it also seems like you can apply the above argument to a bunch of other topics besides GDP, like the ones listed in this comment [https://w
8Samuel Dylan Martin2dOne of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently. My very rough model of how their beliefs flow forward is: PAUL Low initial confidence on truth/coherence of 'core of generality' → Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loosely analogous individual historical example. Natural selection wasn't intelligently aiming for powerful world-affecting capabilities, and so stumbled on them relatively suddenly with humans. Therefore, we learn very little about whether there will/won't be a spectrum of powerful intermediately general AIs from the historical case of evolution - all we know is that it didn't happen during evolution, and we've got good reasons to think it's a lot more likely to happen for AI. For other reasons (precedents already exist - MuZero is insect-brained but better at chess or go than a chimp, plus that's the default with technology we're heavily investing in), we should expect there will be powerful, intermediately general AIs by default (and our best guess of the timescale should be anchored to the speed of human-driven progress, since that's where it will start) - No core of generality Then, from there: No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class → Qualitative prediction of more common continuous progress on the 'intelligence' of narrow AI and prediction of continuous takeoff ELIEZER High initial confidence on truth/coherence of 'core of generality' → Even though there are some disanalogies between Evolution and AI progress, the exact details of how closely analogous the two situations are d
8johnswentworth2dThis is super helpful, thanks. Good explanation. With this formulation of the "continuous view", I can immediately think of places where I'd bet against it. The first which springs to mind is aging: I'd bet that we'll see a discontinuous jump in achievable lifespan of mice. The gears here are nicely analogous to AGI too: I expect that [https://www.lesswrong.com/s/3hfjaztptwEt2cCve/p/ui6mDLdqXkaXiDMJ5#Foundations] there's a "common core" (or shared cause) underlying all the major diseases of aging, and fixing that core issue will fix all of them at once, in much the same way that figuring out the "core" of intelligence will lead to a big discontinuous jump in AI capabilities. I can also point to current empirical evidence for the existence of a common core in aging, which might suggest analogous types of evidence to look at in the intelligence context. Thinking about other analogous places... presumably we saw a discontinuous jump in flight range when Sputnik entered orbit. That one seems extremely closely analogous to AGI. There it's less about the "common core" thing, and more about crossing some critical threshold. Nuclear weapons and superconductors both stand out a-priori as places where we'd expect a critical-threshold-related discontinuity, though I don't think people were optimizing hard enough in superconductor-esque directions for the continuous view to make a strong prediction there (at least for the original discovery of superconductors).
5Rohin Shah1dI agree that when you know about a critical threshold, as with nukes or orbits, you can and should predict a discontinuity there. (Sufficient specific knowledge is always going to allow you to outperform a general heuristic.) I think that (a) such thresholds are rare in general and (b) in AI in particular there is no such threshold. (According to me (b) seems like the biggest difference between Eliezer and Paul.) Some thoughts on aging: * It does in fact seem surprising, given the complexity of biology relative to physics, if there is a single core cause and core solution that leads to a discontinuity. * I would a priori guess that there won't be a core solution. (A core cause seems more plausible, and I'll roll with it for now.) Instead, we see a sequence of solutions that intervene on the core problem in different ways, each of which leads to some improvement on lifespan, and discovering these at different times leads to a smoother graph. * That being said, are people putting in a lot of effort into solving aging in mice? Everyone seems to constantly be saying that we're putting in almost no effort whatsoever. If that's true then a jumpy graph would be much less surprising. * As a more specific scenario, it seems possible that the graph of mouse lifespan over time looks basically flat, because we were making no progress due to putting in ~no effort. I could totally believe in this world that someone puts in some effort and we get a discontinuity, or even that the near-zero effort we're putting in finds some intervention this year (but not in previous years) which then looks like a discontinuity. If we had a good operationalization, and people are in fact putting in a lot of effort now, I could imagine putting my $100 to your $300 on this (not going beyond 1:3 odds simply because you know way more about aging than I do).
2Matthew "Vaniver" Graves2dWhile I think orbit is the right sort of discontinuity for this, I think you need to specify 'flight range' in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke. [https://en.wikipedia.org/wiki/Operation_Plumbbob#Missing_steel_bore_cap] [But in terms of something like "altitude achieved", I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]
6johnswentworth2dMy understanding is that Sputnik was a big discontinuous jump in "distance which a payload (i.e. nuclear bomb) can be delivered" (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.
2Matthew "Vaniver" Graves2dSo it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I'm not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.) [And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France's distance instead, then the same strategic considerations would have been hit much sooner.]
4Eliezer Yudkowsky6dI don't necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they're not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level. They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they're not doing that, an obvious guess is that it's because they're not getting a big win from that. As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won't work, but maybe they've got enough people trying things to find some? But it's hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.
5Lukas Finnveden6dWhile GPT-4 wouldn't be a lot bigger than GPT-3, Sam Altman did indicate that it'd use a lot more compute. That's consistent with Stack More Layers still working; they might just have found an even better use for compute. (The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)
10Eliezer Yudkowsky6dIf they've found some way to put a lot more compute into GPT-4 without making the model bigger, that's a very different - and unnerving - development.
9Matthew Barnett7dGood Judgment Open had the probability at 65% [https://www.gjopen.com/questions/133-will-google-s-alphago-beat-world-champion-lee-sedol-in-the-five-game-go-match-planned-for-march-2016] on March 8th 2016, with a generally stable forecast since early February (Wikipedia says [https://en.wikipedia.org/wiki/AlphaGo#Match_against_Lee_Sedol] that the first match was on March 9th). Metaculus had the probability at 64% [https://www.metaculus.com/questions/112/will-googles-alphago-beat-go-player-lee-sedol-in-march-2016/] with similar stability over time. Of course, there might be another source that Eliezer is referring to, but for now I think it's right to flag this statement as false.
5Eliezer Yudkowsky6dMy memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was. Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%. Here's an example of one such, which I have a potentially false memory of having maybe read at the time: https://www.gjopen.com/comments/118530
1Matthew Barnett6dThanks for clarifying. That makes sense that you may have been referring to a specific subset of forecasters. I do think that some forecasters tend to be much more reliable than others (and maybe there was/is a way to restrict to "superforecasters" in the UI). I will add the following piece of evidence, which I don't think counts much for or against your memory, but which still seems relevant. Metaculus shows a histogram of predictions. On the relevant question [https://www.metaculus.com/questions/112/will-googles-alphago-beat-go-player-lee-sedol-in-march-2016/] , a relatively high fraction of people put a 20% chance, but it also looks like over 80% of forecasters put higher credences.
7Matthew Barnett7dA note I want to add, if this fact-check ends up being valid: It appears that a significant fraction of Eliezer's argument relies on AlphaGo being surprising. But then his evidence for it being surprising seems to rest substantially on something that was misremembered. That seems important if true. I would point to, for example, this quote, "I mean the superforecasters did already suck once in my observation, which was AlphaGo, but I did not bet against them there, I bet with them and then updated afterwards." It seems like the lesson here, if indeed superforecasters got AlphaGo right and Eliezer got it wrong, is that we should update a little bit towards superforecasting, and against Eliezer.
5Ben Pace7dAdding my recollection of that period: some people made the relevant updates when DeepMind's system beat the European Champion Fan Hui (in October 2015). My hazy recollection is that beating Fan Hui started some people going "Oh huh, I think this is going to happen" and then when AlphaGo beat Lee Sedol (in March 2016) everyone said "Now it is happening".
8Matthew Barnett6dIt seems from this Metaculus question [https://www.metaculus.com/questions/45/in-2016-will-an-ai-player-beat-a-professionally-ranked-human-in-the-ancient-game-of-go/] that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia [https://en.wikipedia.org/wiki/AlphaGo_versus_Fan_Hui]). It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won't happen imminently with high probability. Perhaps a better source of evidence of AlphaGo's surprisingness comes from Nick Bostrom's 2014 book Superintelligence in which he says, "Go-playing amateur programs have been improving at a rate of about 1 level dan/year in recent years. If this rate of improvement continues, they might beat the human world champion in about a decade." (Chapter 1). This vindicates AlphaGo being an impressive discontinuity from pre-2015 progress. Though one can reasonably dispute whether superforecasters thought that the milestone was still far away after being told that Google and Facebook made big investments into it (as was the case in late 2015 [https://www.wired.com/2015/12/google-and-facebook-race-to-solve-the-ancient-game-of-go/] ).
6Ben Pace6dWow thanks for pulling that up. I've gotta say, having records of people's predictions is pretty sweet. Similarly, solid find on the Bostrom quote. Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]
19Jessica Taylor7dA bunch of this was frustrating to read because it seemed like Paul was yelling "we should model continuous changes!" and Eliezer was yelling "we should model discrete events!" and these were treated as counter-arguments to each other. It seems obvious from having read about dynamical systems that continuous models still have discrete phase changes. E.g. consider boiling water. As you put in energy the temperature increases until it gets to the boiling point, at which point more energy put in doesn't increase the temperature further (for a while), it converts more of the water to steam; after all the water is converted to steam, more energy put in increases the temperature further. So there are discrete transitions from (a) energy put in increases water temperature to (b) energy put in converts water to steam to (c) energy put in increases steam temperature. In the case of AI improving AI vs. humans improving AI, a simple model to make would be one where AI quality is modeled as a variable, a, with the following dynamical equation: dadt=h+ra where h is the speed at which humans improve AI and r is a recursive self-improvement efficiency factor. The curve transitions from a line at early times (where h>>ra) to an exponential at later times (where ra>>h). It could be approximated as a piecewise function with a linear part followed by an exponential part, which is a more-discrete approximation than the original function, which has a continuous transition between linear and exponential. This is nowhere near an adequate model of AI progress, but it's the sort of model that would be created in the course of a mathematically competent discourse on this subject on the way to creating an adequate model. Dynamical systems contains many beautiful and useful concepts like basins of attraction [https://en.wikipedia.org/wiki/Attractor#Basins_of_attraction] which make sense of discrete and continuous phenomena simultaneously (i.e. there are a discrete number of basins of at
7Paul Christiano6d(I'm interested in which of my claims seem to dismiss or not adequately account for the possibility that continuous systems have phase changes.)
8Jessica Taylor6dThis section seemed like an instance of you and Eliezer talking past each other in a way that wasn't locating a mathematical model containing the features you both believed were important (e.g. things could go "whoosh" while still being continuous): [Christiano][13:46] Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress [Yudkowsky][13:46] unfortunately a lot of what you are saying, from my perspective, has the flavor of, “but can’t you tell me about your predictions earlier on of the impact on global warming at the Homo erectus level” you have stories about why this is like totally not a fair comparison I do not share these stories [Christiano][13:46] I don’t understand either your objection nor the reductio like, here’s how I think it works: AI systems improve gradually, including on metrics like “How long does it take them to do task X?” or “How high-quality is their output on task X?” [Yudkowsky][13:47] I feel like the thing we know is something like, there is a sufficiently high level where things go whooosh humans-from-hominids style [Christiano][13:47] We can measure the performance of AI on tasks like “Make further AI progress, without human input” Any way I can slice the analogy, it looks like AI will get continuously better at that task
8Paul Christiano6dMy claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead. This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees I thought Eliezer's comment "there is a sufficiently high level where things go whooosh humans-from-hominids style" was missing the point. I think it might have been good to offer some quantitative models at that point though I haven't had much luck with that. I can totally grant there are possible models for why the AI moves quickly from "much slower than humans" to "much faster than humans," but I wanted to get some model from Eliezer to see what he had in mind. (I find fast takeoff from various frictions more plausible, so that the question mostly becomes one about how close we are to various kinds of efficient frontiers, and where we respectively predict civilization to be adequate/inadequate or progress to be predictable/jumpy.)
11Paul Christiano6dI don’t really feel like anything you are saying undermines my position here, or defends the part of Eliezer’s picture I’m objecting to. (ETA: but I agree with you that it's the right kind of model to be talking about and is good to bring up explicitly in discussion. I think my failure to do so is mostly a failure of communication.) I usually think about models that show the same kind of phase transition you discuss, though usually significantly more sophisticated models and moving from exponential to hyperbolic growth (you only get an exponential in your model because of the specific and somewhat implausible functional form for technology in your equation). With humans alone I expect efficiency to double roughly every year based on the empirical returns curves, though it depends a lot on the trajectory of investment over the coming years. I've spent a long time thinking and talking with people about these issues. At the point when the work is largely done by AI, I expect progress to be maybe 2x faster, so doubling every 6 months. And them from there I expect a roughly hyperbolic trajectory over successive doublings. If takeoff is fast I still expect it to most likely be through a similar situation, where e.g. total human investment in AI R&D never grows above 1% and so at the time when takeoff occurs the AI companies are still only 1% of the economy.
5Matthew Barnett7d+1 on using dynamical systems models to try to formalize the frameworks in this debate. I also give Eliezer points for trying to do something similar in Intelligence Explosion Microeconomics [https://intelligence.org/files/IEM.pdf] (and to people who have looked at this from the macro perspective [https://www.openphilanthropy.org/could-advanced-ai-drive-explosive-economic-growth] ).
17Eliezer Yudkowsky7dThis is a place where I suspect we have a large difference of underlying models. What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers? Particularly if you have an answer to anything that sounds like it's in the style of Gwern's questions [https://www.lesswrong.com/posts/vwLxd6hhFvPbvKmBH/yudkowsky-and-christiano-discuss-takeoff-speeds?commentId=mKgEsfShs2xtaWz4K] , because I think those are the things that actually matter and which are hard to predict from trendlines and which ought to depend on somebody's model of "what kind of generality makes it into GPT-3's successors".
5Paul Christiano6dI agree we seem to have some kind of deeper disagreement here. I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn't use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever. I think these won't get to human level in the next 5 years. We'll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren't currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won't just say "the Future is hard to predict." (Though separately I expect to make somewhat better predictions than you in most of these domains.) A plausible example is that I think it's pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) "on track" to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I'd guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that's 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.
11Paul Christiano6dIf you give me 1 or 10 examples of surface capabilities I'm happy to opine. If you want me to name industries or benchmarks, I'm happy to opine on rates of progress. I don't like the game where you say "Hey, say some stuff. I'm not going to predict anything and I probably won't engage quantitatively with it since I don't think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3." I don't even know which of Gwern's questions you think are interesting/meaningful. "Good meta-learning"--I don't know what this means but if actually ask a real question I can guess. Qualitative descriptions---what is even a qualitative description of GPT-3? "Causality"---I think that's not very meaningful and will be used to describe quantitative improvements at some level made up by the speaker. The spikes in capabilities Gwern talks about seem to be basically measurement artifacts, but if you want to describe a particular measurements I can tell you whether they will have similar artifacts. (How much economic value I can talk about, but you don't seem interested.)
12Eliezer Yudkowsky6dMostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day. The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" and then I say "Starting in 2022 would not surprise me" by way of making an antiprediction that contradicts them. It may sound bold and startling to them, but from my own perspective I'm just expressing my ignorance. That's one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it - why wait for me to ask you? If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3's current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning. We haven't figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we've already applied using the right loss functions. So there's a qualitative guess at a surface capability we might see soon - but when is "soon"? I don't know; history suggests that even what predictably happens later is extremely hard to time. There are subpredic
7Paul Christiano6dI'm mostly not looking for virtue points, I'm looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it. I don't think it's surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren't happy just predicting numbers for overall value added from machine translation, I'd kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.
4Adele Lopez6dIt seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn't seem like you'll be able to find (ii) by looking at predictions for the near future.
6Paul Christiano6dIt seems to me like Eliezer rejects a lot of important heuristics like "things change slowly" and "most innovations aren't big deals" and so on. One reason he may do that is because he literally doesn't know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he'd see that actual gradualists are much better predictors than he imagines.
3Adele Lopez6dThat seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I'd guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don't apply, and that this is foreseeable because of e.g. the nature of recursion. I'd love to hear more about what sort of knowledge about "operating these heuristics" you think he's missing! Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be "shaken out" of his fast-takeoff view due to successful future predictions (until it's too late).
4Paul Christiano6dHe says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight. I agree that after shaking out the other disagreements, we could just end up with Eliezer saying "yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we've applied AI" (or "AI improving AI will be fundamentally unlike automating humans improving AI") but I don't think that's the core of his position right now.

This is how I currently think about higher-order game theory, the study of agents thinking about agents thinking about agents....

This post doesn't add any new big ideas beyond what was already in the post by Diffractor linked above. I just have a slightly different perspective that emphasizes the "metathreat" approach and the role of nondeterminism.

This is a work in progress. There's a bunch of technical work that must be done to make this rigorous. I'll save the details for the last section.

Multiple levels of strategic thinking

Suppose you're an agent with accurate beliefs about your opponents. It doesn't matter where your beliefs come from; perhaps you have experience with these opponents, or perhaps you read your opponents' source code and thought about it. Your beliefs are accurate, although...

1romeostevensit18hTangential, but did you ever happen to read statistical physics of human cooperation?
1Nisan15hNo, I just took a look. The spin glass stuff looks interesting!
1romeostevensit12hAre we talking about the same thing? https://www.sciencedirect.com/science/article/am/pii/S0370157317301424 [https://www.sciencedirect.com/science/article/am/pii/S0370157317301424]

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

1Charlie Steiner1dI have a question about this entirely divorced from practical considerations. Can we play silly ordinal games here? If you assume that the other agent will take the infinite-order policy, but then naively maximize your expected value rather than unrolling the whole game-playing procedure, this is sort of likeω+1. So I guess my question is, if you take this kind of dumb agent (that still has to compute the infinite agent) as your baseline and then re-build an infinite tower of agents (playing other agents of the same level) on top of it, does it reconverge toA∞or does it converge to some weirdAω2?
1Nisan1dI think you're saying Aω+1:=[ΔAω→ΔA0], right? In that case, since A0 embeds into Aω, we'd have Aω+1 embedding into Aω. So not really a step up. If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order. I suppose you could have a hybrid approach: Order ω+1 is allowed to be discontinuous in its order-ω beliefs, but higher orders have to be continuous? Maybe that would get you to ω2.

- 1988 -

Hans Moravec:  Behold my book Mind Children.  Within, I project that, in 2010 or thereabouts, we shall achieve strong AI.  I am not calling it "Artificial General Intelligence" because this term will not be coined for another 15 years or so.

Eliezer (who is not actually on the record as saying this, because the real Eliezer is, in this scenario, 8 years old; this version of Eliezer has all the meta-heuristics of Eliezer from 2021, but none of that Eliezer's anachronistic knowledge):  Really?  That sounds like a very difficult prediction to make correctly, since it is about the future, which is famously hard to predict.

Imaginary Moravec:  Sounds like a fully general counterargument to me.

Eliezer:  Well, it is, indeed, a fully general counterargument against futurism.  Successfully predicting...

Spoiler tags are borked the way I'm using them.

anyway, another place to try your hand at calibration:

Humbali: No. You're expressing absolute certainty in your underlying epistemology and your entire probability distribution

no he isn't, why?

Humbali is asking for Eliezer to double count evidence. Consilience is hard if you don't do your homework on provenance of heuristic and not just naively counting up outputs who themselves also didn't do their homework.

Or in other words: "Do not cite the deep evidence to me, I was there when it was written"

And another ... (read more)

3Adele Lopez9hGoing to try answering this one: The uncertainty must already be "priced in" your probability distribution. So your distribution and hence your median shouldn't shift at all, unless you actually observe new relevant evidence of course.
1jacob_cannell9hBiological cells are computers which must copy bits to copy DNA. So we can ask biology - how much energy do cells use to copy each base pair? Seems they use [https://arxiv.org/abs/1706.05043] just 4 ATP per base pair, or 1 ATP/bit, and thus within an OOM of the 'Landauer bound'. Which is more impressive if you consider that the typically quoted 'Landauer bound' of kT ln 2 is overly optimistic as it only applies when the error probability is 50% or the computation takes infinity. Useful computation requires at least somewhat higher speed than inf and reliability higher than none. The fact that cell replication operates at the Landauer bound already suggests a prior that neurons should be efficient. The Landauer bound at room temp is ~ 0.03 eV. Given that an electron is something of an obvious minimal unit for an electrical computer, the Landauer bound can be thought of as a 30 mV thermal noise barrier. Digital computers operate roughly 30x that for speed and reliability, but if you look at neuron swing voltages it's clear they are operating only ~3x or so above the noise voltage (optimizing hard for energy efficiency at the expense of speed). Assuming 1hz * 10^14 synapses / 10 watts = 10^13 synops/watt, or about 10^7 electron charges at landauer voltage. A synaptic op is at least doing analog signal multiplication, which requires far more energy/charges than a simple binary op - IIRC you need roughly 2^2K carriers and thus erasures to have precision equivalent to K-bit digital, so an 8-bit synaptic op (which IIRC is near where digital/analog mult energy intersects) would be 10^4 or 10^5. I had a relevant ref for this, can't find it now (but think you can derive it from the binomial distribution when std dev/precision is equivalent to 2^-8). Now most synapses are probably smaller/cheaper than 8-bit equiv, but most of the energy cost involved is in pushing data down irreversible dissipative wires (just as true in the brain as it is in a GPU). Now add in the additio
2Adele Lopez8hYou're missing the point! Your arguments apply mostly toward arguing that brains are optimized for energy efficiency, but the important quantity in question is computational efficiency! You even admit that neurons are "optimizing hard for energy efficiency at the expense of speed", but don't seem to have noticed that this fact makes almost everything else you said completely irrelevant!
3Alex Turner10hOK, I'll bite on EY's exercise for the reader, on refuting this "what-if":
3Rob Bensinger12h(This post was partly written as a follow-up to Eliezer's conversations with Paul [https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/vwLxd6hhFvPbvKmBH] and Ajeya [https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/7MCqRnZzvszsxgtJi], so I've inserted it into the conversations sequence [https://www.lesswrong.com/s/n945eovrA3oDueqtq].)
6Eliezer Yudkowsky10hIt does fit well there, but I think it was more inspired by the person I met who thought I was being way too arrogant by not updating in the direction of OpenPhil's timeline estimates to the extent I was uncertain.

Google Podcasts link

This podcast is called AXRP, pronounced axe-urp and short for the AI X-risk Research Podcast. Here, I (Daniel Filan) have conversations with researchers about their papers. We discuss the paper and hopefully get a sense of why it’s been written and how it might reduce the risk of artificial intelligence causing an existential catastrophe: that is, permanently and drastically curtailing humanity’s future potential.

Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so dangerous, what bad AI scenarios could look like, and what he thinks about various techniques to reduce this risk.

Topics we discuss:

...
Load More