I interpreted steven0461 to be saying that many apparent "value disagreements" between humans turn out, upon reflection, to be disagreements about facts rather than values. It's a classic outcome concerning differences in conflict vs. mistake theory: people are interpreted as having different values because they favor different strategies, even if everyone shares the same values.
I had mixed feelings about the dialogue personally. I enjoy the writing style and think Eliezer is a great writer with a lot of good opinions and arguments, which made it enjoyable.
But at the same time, it felt like he was taking down a strawman. Maybe you’d label it part of “conflict aversion”, but I tend to get a negative reaction to take-downs of straw-people who agree with me.
To give an unfair and exaggerated comparison, it would be a bit like reading a take-down of a straw-rationalist in which the straw-rationalist occasionally insists such things as ... (read more)
My understanding is that the correct line is something like, "The COVID-19 vaccines were developed and approved unprecedentedly fast, excluding influenza vaccines." If you want to find examples of short vaccine development, you don't need to go all the way back to the 1957 influenza pandemic. For the 2009 Swine flu pandemic,
Analysis of the genetic divergence of the virus in samples from different cases indicated that the virus jumped to humans in 2008, probably after June, and not later than the end of November, likely around September 2008... By 19 No
It may help to visualize this graph with the line for Platt's Law drawn in.
Overall I find the law to be pretty much empirically validated, at least by the standards I'd expect from a half in jest Law of Prediction.
My honest guess is that most predictors didn’t see that condition and the distribution would shift right if someone pointed that out in the comments.
If this task is bad for operationalization reasons, there are other theorem proving benchmarks. Unfortunately it looks like there aren't a lot of people that are currently trying to improve on the known benchmarks, as far as I'm aware.
The code generation benchmarks are slightly more active. I'm personally partial to Hendrycks et al.'s APPS benchmark, which includes problems that "range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability." (Github link).
I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available
It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equiv... (read more)
I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%. Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict. Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.
I frame no prediction about whether Paul is under 16%. That's a separate matter. I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.
I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024
Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.
I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.
Ha! Okay then. My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probabilit... (read more)
But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."
Don't you think you're making a falsifiable prediction here?
Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.
Thanks for clarifying. That makes sense that you may have been referring to a specific subset of forecasters. I do think that some forecasters tend to be much more reliable than others (and maybe there was/is a way to restrict to "superforecasters" in the UI).
I will add the following piece of evidence, which I don't think counts much for or against your memory, but which still seems relevant. Metaculus shows a histogram of predictions. On the relevant question, a relatively high fraction of people put a 20% chance, but it also looks like over 80% of foreca... (read more)
It seems from this Metaculus question that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia).
It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won't happen imminently with high probability.
Perhaps a better source of evidence of A... (read more)
Wow thanks for pulling that up. I've gotta say, having records of people's predictions is pretty sweet. Similarly, solid find on the Bostrom quote.
Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]
A note I want to add, if this fact-check ends up being valid:
It appears that a significant fraction of Eliezer's argument relies on AlphaGo being surprising. But then his evidence for it being surprising seems to rest substantially on something that was misremembered. That seems important if true.
I would point to, for example, this quote, "I mean the superforecasters did already suck once in my observation, which was AlphaGo, but I did not bet against them there, I bet with them and then updated afterwards." It seems like the lesson here, if indeed superforecasters got AlphaGo right and Eliezer got it wrong, is that we should update a little bit towards superforecasting, and against Eliezer.
superforecasters were claiming that AlphaGo had a 20% chance of beating Lee Se-dol and I didn't disagree with that at the time
Good Judgment Open had the probability at 65% on March 8th 2016, with a generally stable forecast since early February (Wikipedia says that the first match was on March 9th).
Metaculus had the probability at 64% with similar stability over time. Of course, there might be another source that Eliezer is referring to, but for now I think it's right to flag this statement as false.
Reading through the recent Discord discussions with Eliezer, and reading and replying to comments, has given me the following impression of a crux of the takeoff debate. It may not be the crux. But it seems like a crux nonetheless, unless I'm misreading a lot of people.
Let me try to state it clearly:
The foom theorists are saying something like, "Well, you can usually-in-hindsight say that things changed gradually, or continuously, along some measure. You can use these measures after-the-fact, but that won't tell you about the actual gradual-ness of t... (read more)
+1 on using dynamical systems models to try to formalize the frameworks in this debate. I also give Eliezer points for trying to do something similar in Intelligence Explosion Microeconomics (and to people who have looked at this from the macro perspective).
What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?
What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?
Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships.
I would generally be surprised if there were many highly non-li... (read more)
That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you have to throw on additional qualifiers that I'm not going to look up; pretend I did that.) So on a graph of "language model of type X revenue over time", total revenue is static at 0 for a long time and then shortly after GPT-3's creation departs from 0.
I think it's the nature of every product that comes on the market that it will experience a discontinuity from having zero revenue to having some revenue... (read more)
your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend.
This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you'll understand what will happen with economics/security/etc., when in fact you don't have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don't have much understanding of this either; I wish I did!]
For example, ... (read more)
I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance.
This is a reasonable thesis, and if indeed it's the one Gwern intended, then I apologize for missing it!That said, I have a few objections,
it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives.
My impression is that some people are impressed by GPT-3's capabilities, whereas your response is "ok, but it's part of the straight-line trend on Penn Treebank; maybe it's a little ahead of schedule, but nothing to write home about." But clearly you and they are focused on different metrics!
That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you... (read more)
Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit?
Because the point I was trying to make was that the result was relatively predictable? I'm genuinely confused what you're asking. I get a slight sense that you're interpreting me as saying something about the inherent dullness of GPT-3 or that it doesn't teach us anything interesting about AI, but I don't see myself as saying anything like that. I ac... (read more)
I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance. Qualitative capabilities are more relevant to safety than benchmark performance is, because while qualitative capabilities include things like "code a simple video game" and "summarize movies with emojis", they also include things like "break out of confinement and kill everyone". It's the latter capabil... (read more)
There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it!
Yeah, my phrasing there was not ideal here. I regret using the word "marketing", but to be fair, I mostly meant what I said in the next few sentences, "Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to real... (read more)
Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit? That is, before they knew about how interesting the consequences would be of that small change in BPC?
At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?
At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?
Who's 'we', exactly? Who are these people who expected all of this to happen, and are going around saying "ah yes, these BIG-Ben... (read more)
To me GPT-3 feels much (much) closer to my mainline than to Eliezer's
To add to this sentiment, I'll post the graph from my notebook on language model progress. I refer to the Penn Treebank task a lot when making this point because it seems to have a lot of good data, but you can also look at the other tasks and see basically the same thing.
The last dip in the chart is from GPT-3. It looks like GPT-3 was indeed a discontinuity in progress but not a very shocking one. It roughly would have taken about one or two more years at ordinary progress to get t... (read more)
The impact of GPT-3 had nothing whatsoever to do with its perplexity on Penn Treebank. I think this is a good example of why focusing on perplexity and 'straight lines on graph go brr' is so terrible, such cargo cult mystical thinking, and crippling. There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it! They didn't even tweet the paper! (Forget an OA blog post, accompanying NYT/TR articles, twee... (read more)
Unfortunately, it looks like Yudkowsky and Christiano weren't able to come to an agreement on what bets to make.
In place of that, I'll ask, whatever camp you belong to: what concrete predictions do you make that you believe most strongly diverge from what people in the "other" camp believe, and can be resolved substantially before the world ends?
I propose we restrict our predictions to roughly 2026, which is pretty soon but probably not world-ending-soon (on almost all views).
I do think that if you get an AGI significantly past human intelligence in all respects, it would obviously tend to FOOM. I mean, I suspect that Eliezer fooms if you give an Eliezer the ability to backup, branch, and edit himself.
What improvements would you make to your brain that you would anticipate yielding greater intelligence? I can think of a few possible strategies:
For an AI, the first strategy is equivalen... (read more)
This milestone resembles the "Atari fifty" task in the 2016 Expert Survey in AI,
Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games, but used hundreds of hours of play to train.
Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.
For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games, but used hundreds of hours of play to train.
Previously Katja Grace posted that the original Atari task had been achieved early. Experts estimated the Atari fifty task would take 5 years with 50% chance (so, in 2021), though they thought there was a 2... (read more)
The survey doesn't seem to define what 'human novice' performance is. But EfficientZero's performance curve looks pretty linear in Figure 7 over the 220k frames, finishing at ~1.9x human gametester performance after 2h (6x the allotted time). So presumably at 20min, EfficientZero is ~0.3x 2h-gametester-performance (1.9x * 1/6)? That doesn't strike me as being an improbable level of performance for a novice, so it's possible that challenge has been met. If not, seems likely that we're pretty close to it.
Thanks for the useful comment.
You might say "okay, sure, at some level of scaling GPTs learn enough general reasoning that they can manage a corporation, but there's no reason to believe it's near".
Right. This is essentially the same way we might reply to Claude Shannon if he said that some level of brute-force search would solve the problem of natural language translation.
one of the major points of the bio anchors framework is to give a reasonable answer to the question of "at what level of scaling might this work", so I don't think you can argue that cur
These arguments prove too much; you could apply them to pretty much any technology (e.g. self-driving cars, 3D printing, reusable rockets, smart phones, VR headsets...).
I suppose my argument has an implicit, "current forecasts are not taking these arguments into account." If people actually were taking my arguments into account, and still concluding that we should have short timelines, then this would make sense. But, I made these arguments because I haven't seen people talk about these considerations much. For example, I deliberately avoided the argument ... (read more)
In addition to the reasons you mentioned, there's also empirical evidence that technological revolutions generally precede the productivity growth that they eventually cause. In fact, economic growth may even slow down as people pay costs to adopt new technologies. Philippe Aghion and Peter Howitt summarize the state of the research in chapter 9 of The Economics of Growth,
Although each [General Purpose Technology (GPT)] raises output and productivity in the long run, it can also cause cyclical fluctuations while the economy adjusts to it. As David (1990) a
If AGI is taken to mean, the first year that there is radical economic, technological, or scientific progress, then these are my AGI timelines.
I have a bit lower probability for near-term AGI than many people here are. I model my biggest disagreement as about how much work is required to move from high-cost impressive demos to real economic performance. I also have an intuition that it is really hard to automate everything and progress will be bottlene... (read more)
It's unclear to me what "human-level AGI" is, and it's also unclear to me why the prediction is about the moment an AGI is turned on somewhere. From my perspective, the important thing about artificial intelligence is that it will accelerate technological, economic, and scientific progress. So, the more important thing to predict is something like, "When will real economic growth rates reach at least 30% worldwide?"
It's worth comparing the vagueness in this question with the specificity in this one on Metaculus. From the ... (read more)
To me the most obvious risk (which I don't ATM think of as very likely for the next few iterations, or possibly ever, since the training is myopic/SL) would be that GPT-N in fact is computing (e.g. among other things) a superintelligent mesa-optimization process that understands the situation it is in and is agent-y.
Do you have any idea of what the mesa objective might be. I agree that this is a worrisome risk, but I was more interested in the type of answer that specified, "Here's a plausible mesa objective given the incentives." Mesa optimization is a more general risk that isn't specific to the narrow training scheme used by GPT-N.
Second, the major disagreement is between those who think progress will be discontinuous and sudden (such as Eliezer Yudkowsky, MIRI) and those who think progress will be very fast by normal historical standards but continuous (Paul Chrisiano, Robin Hanson).
I'm not actually convinced this is a fair summary of the disagreement. As I explained in my post about different AI takeoffs, I had the impression that the primary disagreement between the two groups was over locality rather than the amount of time takeoff lasts. Though of course, I may be misinterpreting people.
They do disagree about locality, yes, but as far as I can tell that is downstream of the assumption that there won't be a very abrupt switch to a new growth mode. A single project pulling suddenly ahead of the rest of the world would happen if the growth curve is such that with a realistic amount (a few months) of lead time you can get ahead of everyone else.
So the obvious difference in predictions is that e.g. Paul/Robin think that takeoff will occur across many systems in the world while MIRI thinks it will occur in a single system. That is because ... (read more)
I tend to think that the pandemic shares more properties with fast takeoff than it does with slow takeoff. Under fast takeoff, a very powerful system will spring into existence after a long period of AI being otherwise irrelevant, in a similar way to how the virus was dormant until early this year. The defining feature of slow takeoff, by contrast, is a gradual increase in abilities from AI systems all across the world.
In particular, I object to this portion of your post,
The "moving goalposts" effect, where new advances in AI are dismissed as not
Oops yes. That's the weaker claim, that I agree with. The stronger claim is that because we can't understand something "all at once" then mechanistic transparency is too hard and so we shouldn't take Daniel's approach. But the way we understand laptops is also in a mechanistic sense. No one argues that because laptops are too hard to understand all at once, then we should't try to understand them mechanistically.
This seems to be assuming that we have to be able to take any complex trained AGI-as-a-neural-net
I'd be shocked if there was anyone to whom it was mechanistically transparent how a laptop loads a website, down to the gates in the laptop.
Could you clarify why this is an important counterpoint. It seems obviously useful to understand mechanistic details of a laptop in order to debug it. You seem to be arguing the [ETA: weaker] claim that nobody understands the an entire laptop "all at once", as in, they can understand all the details in their head simultaneously. But such an understanding is almost never possible for any complex system, a... (read more)
I liked it.
For my part, I think you summarized my position fairly well. However, after thinking about this argument for another few days, I have more points to add.
Later, other Europeans would come along with other advantages, and they would conquer India, Persia, Vietnam, etc., evidence that while disease was a contributing factor (I certainly am not denying it helped!) it wasn't so important a factor as to render my conclusion invalid (my conclusion, again, is that a moderate technological and strategic advantage can enable a small group to take over a large region.)
Europeans conquered places such as India, but that was centuries later, after they had a large technological advantage, and they also didn't ... (read more)
I really don't think the disease thing is important enough to undermine my conclusion. For the two reasons I gave: One, Afonso didn't benefit from disease
This makes sense, but I think the case of Afonso is sufficiently different from the others that it's a bit of a stretch to use it to imply much about AI takeovers. I think if you want to make a more general point about how AI can be militarily successful, then a better point of evidence is a broad survey of historical military campaigns. Of course, it's still a historically interesting... (read more)
I agree that it would be good to think about how AI might create devastating pandemics. I suspect it wouldn't be that hard to do, for an AI that is generally smarter than us. However, I think my original point still stands.
It's worth clarifying exactly what "original point" stands because I'm currently unsure.
I don't get why you think a small technologically primitive tribe could take over the world if they were immune to disease. Seems very implausible to me.
Sorry, I meant to say, "Were immune to diseases that were curre... (read more)
Here's what I'll be putting in the Alignment Newsletter about this piece. Let me know if you spot inaccuracies or lingering disagreement regarding the opinion section.
This post lists three historical examples of how small human groups conquered large parts of the world, and shows how they are arguably precedents for AI takeover scenarios. The first two historical examples are the conquests of American civilizations by Hernán Cortés and Francisco Pizarro in the early 16th century. The third example is the Portugese capture of key
[ETA: Another way of framing my disagreement is that if you are trying to argue that small groups can take over the world, it seems almost completely irrelevant to focus on relative strategic or technological advantages in light of these historical examples. For instance, it could have theoretically been that some small technologically primitive tribe took over the world if they had some sort of immunity to disease. This would seem to imply that relative strategic advantages in Europeans vs. Americans was not that important. Instead we should focus on what... (read more)
Do you have any thoughts on the critique I just posted?
Very interesting post! However, I have a big disagreement with your interpretation of why the European conquerors succeeded in America, and I think that it undermines much of your conclusion.
In your section titled "What explains these devastating takeovers?" you cite technology and strategic ability, but Old World diseases destroyed the communities in America before the European invaders arrived, most notably smallpox, but also measles, influenza, typhus and the bubonic plague. My reading of historians (from Charles Mann's book 1493, to Alfr... (read more)
This is a good critique; thank you.
I have two responses, and then a few nitpicks.
First response: Disease wasn't a part of Afonso's success. It helped the Europeans take over the Americas but did not help them take over Africa or Asia or the middle east; this suggests to me that it may have been a contributing factor but was not the primary explanation / was not strictly necessary.
Second response: Even if we decide that Cortes and Pizarro wouldn't have been able to succeed without the disease, my overall conclusion still stands. This is beca... (read more)
See also Alex Turner's work on formalizing instrumentally convergent goals, and his walkthrough of the MIRI paper.
That's not what I said.
That's fair. I didn't actually quite understand what your position was and was trying to clarify.
I think it's plausible that there will be a simple basin that we can regularise an AGI into, because I have some ideas about how to do it, and because the world hasn't thought very hard about the problem yet (meaning the lack of extant solutions is to some extent explained away).
That makes sense. More pessimistically, one could imagine that the reason why no one has thought very hard about it is because in practice, it doesn't really help you that much to have a mechanistic understanding of a neural network in order to do useful work. Though... (read more)
I greatly appreciate writing your thoughts up. I have a few questions about your agenda/optimism regarding particular approaches.
The type of transparency that I’m most excited about is mechanistic, in a sense that I’ve described elsewhere.
Let me know if you'd agree with the following. The mechanistic approach is about understanding the internal structure of a program and how it behaves on arbitrary inputs. Mechanistic transparency is quite different from the more typical meaning of interpretability where we would like to know why an AI d... (read more)
see above about trying to conform with the way terms are used, rather than defining terms and trying to drag everyone else along.
This seems odd given your objection to "soft/slow" takeoff usage and your advocacy of "continuous takeoff" ;)
Does this make sense to you?
Yeah that makes sense. Your points about "bio" not being short for "biological" were valid, but the fact that as a listener I didn't know that fact implies that it seems really easy to mess up the language usage here. I'm starting to think that the real fight should be about using terms that aren't self explanatory.
Have you actually observed it being used in ways that you fear (and which would be prevented if we were to redefine it more narrowly)?
I'm not sure about whether it would have be... (read more)
I agree that this is troubling, though I think it's similar to how I wouldn't want the term biorisk to be expanded to include biodiversity loss (a risk, but not the right type), regular human terrorism (humans are biological, but it's a totally different issue), zombie uprisings (they are biological, but it's totally ridiculous), alien invasions etc.
Not to say that's what you are doing with AI risk. I'm worried about what others will do with it if the term gets expanded.