All of Matthew Barnett's Comments + Replies

Considerations on interaction between AI and expected value of the future

I interpreted steven0461 to be saying that many apparent "value disagreements" between humans turn out, upon reflection, to be disagreements about facts rather than values. It's a classic outcome concerning differences in conflict vs. mistake theory: people are interpreted as having different values because they favor different strategies, even if everyone shares the same values.

1Beth Barnes1dah yeah, so the claim is something like 'if we think other humans have 'bad values', maybe in fact our values are the same and one of us is mistaken, and we'll get less mistaken over time'?
Biology-Inspired AGI Timelines: The Trick That Never Works

I had mixed feelings about the dialogue personally. I enjoy the writing style and think Eliezer is a great writer with a lot of good opinions and arguments, which made it enjoyable.

But at the same time, it felt like he was taking down a strawman. Maybe you’d label it part of “conflict aversion”, but I tend to get a negative reaction to take-downs of straw-people who agree with me.

To give an unfair and exaggerated comparison, it would be a bit like reading a take-down of a straw-rationalist in which the straw-rationalist occasionally insists such things as ... (read more)

Shulman and Yudkowsky on AI progress

My understanding is that the correct line is something like, "The COVID-19 vaccines were developed and approved unprecedentedly fast, excluding influenza vaccines." If you want to find examples of short vaccine development, you don't need to go all the way back to the 1957 influenza pandemic. For the 2009 Swine flu pandemic,

Analysis of the genetic divergence of the virus in samples from different cases indicated that the virus jumped to humans in 2008, probably after June, and not later than the end of November,[38] likely around September 2008... By 19 No

... (read more)
Biology-Inspired AGI Timelines: The Trick That Never Works

It may help to visualize this graph with the line for Platt's Law drawn in.

Overall I find the law to be pretty much empirically validated, at least by the standards I'd expect from a half in jest Law of Prediction.

Yudkowsky and Christiano discuss "Takeoff Speeds"

My honest guess is that most predictors didn’t see that condition and the distribution would shift right if someone pointed that out in the comments.

Yudkowsky and Christiano discuss "Takeoff Speeds"

If this task is bad for operationalization reasons, there are other theorem proving benchmarks. Unfortunately it looks like there aren't a lot of people that are currently trying to improve on the known benchmarks, as far as I'm aware.

The code generation benchmarks are slightly more active. I'm personally partial to Hendrycks et al.'s APPS benchmark, which includes problems that "range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability." (Github link).

Yudkowsky and Christiano discuss "Takeoff Speeds"

I'll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/heldout dataset of past IMO problems, conditional on such a dataset being available

It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equiv... (read more)

4Rob Bensinger13dMy model of Eliezer may be wrong, but I'd guess that this isn't a domain where he has many over-50% predictions of novel events at all? See also 'I don't necessarily expect self-driving cars before the apocalypse'. My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like 'maybe we'll make insane progress on theorem-proving (or whatever) out of the blue'. Again, I may be wrong, but my intuition is that you're Paul-omorphizing Eliezer when you assume that >16% probability of huge progress in X by year Y implies >50% probability of smaller-but-meaningful progress in X by year Y.

I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%.  Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict.  Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%.  That's a separate matter.  I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024

Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.

I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.

4Paul Christiano7dI think Metaculus is closer to Eliezer here: conditioned on this problem being resolved it seems unlikely for the AI to be either open-sourced or easily reproducible.

Ha!  Okay then.  My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probabilit... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."

Don't you think you're making a falsifiable prediction here?

Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.

3Eliezer Yudkowsky13d(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)
Christiano, Cotra, and Yudkowsky on AI progress

Thanks for clarifying. That makes sense that you may have been referring to a specific subset of forecasters. I do think that some forecasters tend to be much more reliable than others (and maybe there was/is a way to restrict to "superforecasters" in the UI).

I will add the following piece of evidence, which I don't think counts much for or against your memory, but which still seems relevant. Metaculus shows a histogram of predictions. On the relevant question, a relatively high fraction of people put a 20% chance, but it also looks like over 80% of foreca... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

It seems from this Metaculus question that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia).

It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won't happen imminently with high probability.

Perhaps a better source of evidence of A... (read more)

Wow thanks for pulling that up. I've gotta say, having records of people's predictions is pretty sweet. Similarly, solid find on the Bostrom quote.

Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]

Christiano, Cotra, and Yudkowsky on AI progress

A note I want to add, if this fact-check ends up being valid:

It appears that a significant fraction of Eliezer's argument relies on AlphaGo being surprising. But then his evidence for it being surprising seems to rest substantially on something that was misremembered. That seems important if true.

I would point to, for example, this quote, "I mean the superforecasters did already suck once in my observation, which was AlphaGo, but I did not bet against them there, I bet with them and then updated afterwards." It seems like the lesson here, if indeed superforecasters got AlphaGo right and Eliezer got it wrong, is that we should update a little bit towards superforecasting, and against Eliezer.

5Ben Pace13dAdding my recollection of that period: some people made the relevant updates when DeepMind's system beat the European Champion Fan Hui (in October 2015). My hazy recollection is that beating Fan Hui started some people going "Oh huh, I think this is going to happen" and then when AlphaGo beat Lee Sedol (in March 2016) everyone said "Now it is happening".
Christiano, Cotra, and Yudkowsky on AI progress

superforecasters were claiming that AlphaGo had a 20% chance of beating Lee Se-dol and I didn't disagree with that at the time

Good Judgment Open had the probability at 65% on March 8th 2016, with a generally stable forecast since early February (Wikipedia says that the first match was on March 9th).

Metaculus had the probability at 64% with similar stability over time. Of course, there might be another source that Eliezer is referring to, but for now I think it's right to flag this statement as false.

5Eliezer Yudkowsky13dMy memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was. Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%. Here's an example of one such, which I have a potentially false memory of having maybe read at the time: https://www.gjopen.com/comments/118530

A note I want to add, if this fact-check ends up being valid:

It appears that a significant fraction of Eliezer's argument relies on AlphaGo being surprising. But then his evidence for it being surprising seems to rest substantially on something that was misremembered. That seems important if true.

I would point to, for example, this quote, "I mean the superforecasters did already suck once in my observation, which was AlphaGo, but I did not bet against them there, I bet with them and then updated afterwards." It seems like the lesson here, if indeed superforecasters got AlphaGo right and Eliezer got it wrong, is that we should update a little bit towards superforecasting, and against Eliezer.

Matthew Barnett's Shortform

Reading through the recent Discord discussions with Eliezer, and reading and replying to comments, has given me the following impression of a crux of the takeoff debate. It may not be the crux. But it seems like a crux nonetheless, unless I'm misreading a lot of people. 

Let me try to state it clearly:

The foom theorists are saying something like, "Well, you can usually-in-hindsight say that things changed gradually, or continuously, along some measure. You can use these measures after-the-fact, but that won't tell you about the actual gradual-ness of t... (read more)

2Adele Lopez13dI lean toward the foom side, and I think I agree with the first statement. The intuition for me is that it's kinda like p-hacking (there are very many possible graphs, and some percentage of those will be gradual), or using a log-log plot (which makes everything look like a nice straight line, but are actually very broad predictions when properly accounting for uncertainty). Not sure if I agree with the addendum or not yet, and I'm not sure how much of a crux this is for me yet.
Christiano, Cotra, and Yudkowsky on AI progress

+1 on using dynamical systems models to try to formalize the frameworks in this debate. I also give Eliezer points for trying to do something similar in Intelligence Explosion Microeconomics (and to people who have looked at this from the macro perspective).

Yudkowsky and Christiano discuss "Takeoff Speeds"

What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?

Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships.

I would generally be surprised if there were many highly non-li... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you have to throw on additional qualifiers that I'm not going to look up; pretend I did that.) So on a graph of "language model of type X revenue over time",  total revenue is static at 0 for a long time and then shortly after GPT-3's creation departs from 0.

I think it's the nature of every product that comes on the market that it will experience a discontinuity from having zero revenue to having some revenue... (read more)

your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend.

This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you'll understand what will happen with economics/security/etc., when in fact you don't have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don't have much understanding of this either; I wish I did!]

For example, ... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance.

This is a reasonable thesis, and if indeed it's the one Gwern intended, then I apologize for missing it!

That said, I have a few objections,

  • Isn't it a bit suspicious that the thing-that's-discontinuous is hard to measure, but the-thing-that's-continuous isn't? I mean, this isn't totally suspicious, because
... (read more)
4Edouard Harris14dYeah, these are interesting points. I sympathize with this view, and I agree there is some element of truth to it that may point to a fundamental gap in our understanding (or at least in mine). But I'm not sure I entirely agree that discontinuous capabilities are necessarily hard to measure: for example, there are benchmarks [https://github.com/openai/grade-school-math] available for things like arithmetic, which one can train on and make quantitative statements about. I think the key to the discontinuity question is rather that 1) it's the jumps in model scaling that are happening in discrete increments; and 2) everything is S-curves, and a discontinuity always has a linear regime if you zoom in enough. Those two things together mean that, while a capability like arithmetic might have a continuous performance regime on some domain, in reality you can find yourself halfway up the performance curve in a single scaling jump (and this is in fact what happened with arithmetic and GPT-3). So the risk, as I understand it, is that you end up surprisingly far up the scale of "world-ending" capability from one generation to the next, with no detectable warning shot beforehand. No, you're right as far as I know; at least I'm not aware of any such attempted predictions. And in fact, the very absence of such prediction attempts is interesting in itself. One would imagine that correctly predicting the capabilities of an AI from its scale ought to be a phenomenally valuable skill — not just from a safety standpoint, but from an economic one too. So why, indeed, didn't we see people make such predictions, or at least try to? There could be several reasons. For example, perhaps Paul (and other folks who subscribe to the "continuum" world-model) could have done it, but they were unaware of the enormous value of their predictive abilities. That seems implausible, so let's assume they knew the value of such predictions would be huge. But if you know the value of doing something i

it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives.

My impression is that some people are impressed by GPT-3's capabilities, whereas your response is "ok, but it's part of the straight-line trend on Penn Treebank; maybe it's a little ahead of schedule, but nothing to write home about." But clearly you and they are focused on different metrics! 

That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit?

Because the point I was trying to make was that the result was relatively predictable? I'm genuinely confused what you're asking. I get a slight sense that you're interpreting me as saying something about the inherent dullness of GPT-3 or that it doesn't teach us anything interesting about AI, but I don't see myself as saying anything like that. I ac... (read more)

I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we've seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance. Qualitative capabilities are more relevant to safety than benchmark performance is, because while qualitative capabilities include things like "code a simple video game" and "summarize movies with emojis", they also include things like "break out of confinement and kill everyone". It's the latter capabil... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it!

Yeah, my phrasing there was not ideal here. I regret using the word "marketing", but to be fair, I mostly meant what I said in the next few sentences, "Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to real... (read more)

Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit? That is, before they knew about how interesting the consequences would be of that small change in BPC?

At the same time, don't you think we would have expected similar results in like two more years at ordinary progress?

Who's 'we', exactly? Who are these people who expected all of this to happen, and are going around saying "ah yes, these BIG-Ben... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

To me GPT-3 feels much (much) closer to my mainline than to Eliezer's

To add to this sentiment, I'll post the graph from my notebook on language model progress. I refer to the Penn Treebank task a lot when making this point because it seems to have a lot of good data, but you can also look at the other tasks and see basically the same thing. 

The last dip in the chart is from GPT-3. It looks like GPT-3 was indeed a discontinuity in progress but not a very shocking one. It roughly would have taken about one or two more years at ordinary progress to get t... (read more)

The impact of GPT-3 had nothing whatsoever to do with its perplexity on Penn Treebank. I think this is a good example of why focusing on perplexity and 'straight lines on graph go brr' is so terrible, such cargo cult mystical thinking, and crippling. There's something astonishing to see someone resort to explaining away GPT-3's impact as 'OpenAI was just good at marketing the results'. Said marketing consisted of: 'dropping a paper on Arxiv'. Not even tweeting it! They didn't even tweet the paper! (Forget an OA blog post, accompanying NYT/TR articles, twee... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

Unfortunately, it looks like Yudkowsky and Christiano weren't able to come to an agreement on what bets to make.

In place of that, I'll ask, whatever camp you belong to: what concrete predictions do you make that you believe most strongly diverge from what people in the "other" camp believe, and can be resolved substantially before the world ends?

I propose we restrict our predictions to roughly 2026, which is pretty soon but probably not world-ending-soon (on almost all views).

Discussion with Eliezer Yudkowsky on AGI interventions

I do think that if you get an AGI significantly past human intelligence in all respects, it would obviously tend to FOOM. I mean, I suspect that Eliezer fooms if you give an Eliezer the ability to backup, branch, and edit himself.

What improvements would you make to your brain that you would anticipate yielding greater intelligence? I can think of a few possible strategies:

  • Just adding a bunch of neurons everywhere. Make my brain bigger.
  • Study how very smart brains look, and try to make my brain look more like theirs.

For an AI, the first strategy is equivalen... (read more)

4Daniel Kokotajlo16dEY knows more neuroscience than me (I know very little) but here's a 5-min brainstorm of ideas: --For a fixed compute budget, spend more of it on neurons associated with higher-level thought (the neocortex?) and less of it on neurons associated with e.g. motor control or vision. --Assuming we are an upload of some sort rather than a physical brain, tinker with the rules a bit so that e.g. neuron waste products get magically deleted instead of having to be pumped out, neurons never run out of energy/oxygen and need to rest, etc. Study situations where you are in "peak performance" or "flow" and then explore ways to make your brain enter those states at will. --Use ML pruning techniques to cut away neurons that aren't being useful, to get slightly crappier mini-Eliezers that cost 10% the compute. These can then automate away 90% of your cognition, saving you enough compute that you can either think a few times faster or have a few copies running in parallel. --Build automated tools that search through your brain for circuits that are doing something pretty simple, like a giant OR gate or an oscillator, and then replace those circuits with small bits of code, thereby saving significant compute. If anything goes wrong, no worries, just revert to backup. This was a fun exercise!
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

This milestone resembles the "Atari fifty" task in the 2016 Expert Survey in AI,

Outperform human novices on 50% of Atari games after only 20 minutes of training play time and no game specific knowledge.

For context, the original Atari playing deep Q-network outperforms professional game testers on 47% of games, but used hundreds of hours of play to train.

Previously Katja Grace posted that the original Atari task had been achieved early. Experts estimated the Atari fifty task would take 5 years with 50% chance (so, in 2021), though they thought there was a 2... (read more)

The survey doesn't seem to define what 'human novice' performance is. But EfficientZero's performance curve looks pretty linear in Figure 7 over the 220k frames, finishing at ~1.9x human gametester performance after 2h (6x the allotted time). So presumably at 20min, EfficientZero is ~0.3x 2h-gametester-performance (1.9x * 1/6)? That doesn't strike me as being an improbable level of performance for a novice, so it's possible that challenge has been met. If not, seems likely that we're pretty close to it.

Three reasons to expect long AI timelines

Thanks for the useful comment.

You might say "okay, sure, at some level of scaling GPTs learn enough general reasoning that they can manage a corporation, but there's no reason to believe it's near".

Right. This is essentially the same way we might reply to Claude Shannon if he said that some level of brute-force search would solve the problem of natural language translation.

one of the major points of the bio anchors framework is to give a reasonable answer to the question of "at what level of scaling might this work", so I don't think you can argue that cur

... (read more)
5Rohin Shah7moFwiw, the problem I think is hard is "how to make models do stuff that is actually what we want, rather than only seeming like what we want, or only initially what we want until the model does something completely different like taking over the world". I don't expect that it will be hard to get models that look like they're doing roughly the thing we want; see e.g. the relative ease of prompt engineering or learning from human preferences. If I thought that were hard, I would agree with you. I would guess that this is relatively uncontroversial as a view within this field? Not sure though. (One of my initial critiques of bio anchors was that it didn't take into account the cost of human feedback, except then I actually ran some back-of-the-envelope calculations and it turned out it was dwarfed by the cost of compute; maybe that's your crux too?)
Three reasons to expect long AI timelines

These arguments prove too much; you could apply them to pretty much any technology (e.g. self-driving cars, 3D printing, reusable rockets, smart phones, VR headsets...).

I suppose my argument has an implicit, "current forecasts are not taking these arguments into account." If people actually were taking my arguments into account, and still concluding that we should have short timelines, then this would make sense. But, I made these arguments because I haven't seen people talk about these considerations much. For example, I deliberately avoided the argument ... (read more)

2Daniel Kokotajlo8moI definitely agree that our timelines forecasts should take into account the three phenomena you mention, and I also agree that e.g. Ajeya's doesn't talk about this much. I disagree that the effect size of these phenomena is enough to get us to 50 years rather than, say, +5 years to whatever our opinion sans these phenomena was. I also disagree that overall Ajeya's model is an underestimate of timelines, because while indeed the phenomena you mention should cause us to shade timelines upward, there is a long list of other phenomena I could mention which should cause us to shade timelines downward, and it's unclear which list is overall more powerful. On a separate note, would you be interested in a call sometime to discuss timelines? I'd love to share my overall argument with you and hear your thoughts, and I'd love to hear your overall timelines model if you have one.
Against GDP as a metric for timelines and takeoff speeds

In addition to the reasons you mentioned, there's also empirical evidence that technological revolutions generally precede the productivity growth that they eventually cause. In fact, economic growth may even slow down as people pay costs to adopt new technologies. Philippe Aghion and Peter Howitt summarize the state of the research in chapter 9 of The Economics of Growth,

Although each [General Purpose Technology (GPT)] raises output and productivity in the long run, it can also cause cyclical fluctuations while the economy adjusts to it. As David (1990) a

... (read more)
2Daniel Kokotajlo1yWow, yeah, that's an excellent point. EDIT: See e.g. this paper: https://www.nber.org/papers/w24001 [https://www.nber.org/papers/w24001]
Forecasting Thread: AI Timelines

If AGI is taken to mean, the first year that there is radical economic, technological, or scientific progress, then these are my AGI timelines.

My percentiles

  • 5th: 2029-09-09
  • 25th: 2049-01-17
  • 50th: 2079-01-24
  • 75th: above 2100-01-01
  • 95th: above 2100-01-01

I have a bit lower probability for near-term AGI than many people here are. I model my biggest disagreement as about how much work is required to move from high-cost impressive demos to real economic performance. I also have an intuition that it is really hard to automate everything and progress will be bottlene... (read more)

Forecasting Thread: AI Timelines

It's unclear to me what "human-level AGI" is, and it's also unclear to me why the prediction is about the moment an AGI is turned on somewhere. From my perspective, the important thing about artificial intelligence is that it will accelerate technological, economic, and scientific progress. So, the more important thing to predict is something like, "When will real economic growth rates reach at least 30% worldwide?"

It's worth comparing the vagueness in this question with the specificity in this one on Metaculus. From the ... (read more)

2jungofthewon1yI generally agree with this but think the alternative goal of "make forecasting easier" is just as good, might actually make aggregate forecasts more accurate in the long run, and may require things that seemingly undermine the virtue of precision. More concretely, if an underdefined question makes it easier for people to share whatever beliefs they already have, then facilitates rich conversation among those people, that's better than if a highly specific question prevents people from making a prediction at all. At least as much, if not more, of the value of making public, visual predictions like this comes from the ensuing conversation and feedback than from the precision of the forecasts themselves. Additionally, a lot of assumptions get made at the time the question is defined more precisely, which could prematurely limit the space of conversation or ideas. There are good reasons why different people define AGI the way they do, or the moment of "AGI arrival" the way they do, that might not come up if the question askers had taken a point of view.
What specific dangers arise when asking GPT-N to write an Alignment Forum post?
To me the most obvious risk (which I don't ATM think of as very likely for the next few iterations, or possibly ever, since the training is myopic/SL) would be that GPT-N in fact is computing (e.g. among other things) a superintelligent mesa-optimization process that understands the situation it is in and is agent-y.

Do you have any idea of what the mesa objective might be. I agree that this is a worrisome risk, but I was more interested in the type of answer that specified, "Here's a plausible mesa objective given the incentives." Mesa optimization is a more general risk that isn't specific to the narrow training scheme used by GPT-N.

1David Krueger1yNo, and I don't think it really matters too much... what's more important is the "architecture" of the "mesa-optimizer". It's doing something that looks like search/planning/optimization/RL. Roughly speaking, the simplest form of this model of how things works says: "Its so hard to solve NLP without doing agent-y stuff that when we see GPT-N produce a solution to NLP, we should assume that it's doing agenty stuff on the inside... i.e. what probably happened is it evolved or stumbled upon something agenty, and then that agenty thing realized the situation it was in and started plotting a treacherous turn".
Modelling Continuous Progress
Second, the major disagreement is between those who think progress will be discontinuous and sudden (such as Eliezer Yudkowsky, MIRI) and those who think progress will be very fast by normal historical standards but continuous (Paul Chrisiano, Robin Hanson).

I'm not actually convinced this is a fair summary of the disagreement. As I explained in my post about different AI takeoffs, I had the impression that the primary disagreement between the two groups was over locality rather than the amount of time takeoff lasts. Though of course, I may be misinterpreting people.

4Samuel Dylan Martin1yAfter reading your summary of the difference [https://www.lesswrong.com/posts/YgNYA6pj2hPSDQiTE/distinguishing-definitions-of-takeoff#Paul_slow_takeoff] (maybe just a difference in emphasis) between 'Paul slow' vs 'continuous' takeoff, I did some further simulations. A low setting of d (highly continuous progress) doesn't give you a paul slow condition on its own, but it is relatively easy to replicate a situation like this: What we want is a scenario where you don't get intermediate doubling intervals at all in the discontinuous case, but you get at least one in the continuous case. Setting s relatively high appears to do the trick. Here is a scenario [https://i.imgur.com/RSjIKQH.png]where we have very fast post-RSI growth with s=5,c=1,I0=1 and I_AGI=3. I wrote some more code to produce plots of how long each complete interval of doubling took [https://i.imgur.com/mw27P7H.png]in each scenario. The 'default' rate with no contribution from RSI was 0.7. All the continuous scenarios had two complete doubling intervals over intermediate time frames before the doubling time collapsed to under 0.05 on the third doubling. The discontinuous model simply kept the original doubling interval until it collapsed to under 0.05 on the third doubling interval. It's all in this graph. [https://i.imgur.com/mw27P7H.png] Let's make the irresponsible assumption that this actually applies to the real economy, with the current growth mode, non-RSI condition being given by the 'slow/no takeoff', s=0 condition. The current doubling time is a bit over 23 years [https://openborders.info/double-world-gdp/]. In the shallow continuous progress scenario (red line), we get a 9 year doubling, a 4 year doubling and then a ~1 year doubling. In the discontinuous scenario (purple line) we get 2 23 year doublings and then a ~1 year doubling out of nowhere. In other words, this fairly random setting of the parameters (this was the second set I tried) gives us a Paul slow takeoff if you make the assum

They do disagree about locality, yes, but as far as I can tell that is downstream of the assumption that there won't be a very abrupt switch to a new growth mode. A single project pulling suddenly ahead of the rest of the world would happen if the growth curve is such that with a realistic amount (a few months) of lead time you can get ahead of everyone else.

So the obvious difference in predictions is that e.g. Paul/Robin think that takeoff will occur across many systems in the world while MIRI thinks it will occur in a single system. That is because ... (read more)

Possible takeaways from the coronavirus pandemic for slow AI takeoff

I tend to think that the pandemic shares more properties with fast takeoff than it does with slow takeoff. Under fast takeoff, a very powerful system will spring into existence after a long period of AI being otherwise irrelevant, in a similar way to how the virus was dormant until early this year. The defining feature of slow takeoff, by contrast, is a gradual increase in abilities from AI systems all across the world.

In particular, I object to this portion of your post,

The "moving goalposts" effect, where new advances in AI are dismissed as not
... (read more)
2Vika1yThanks Matthew for your interesting points! I agree that it's not clear whether the pandemic is a good analogy for slow takeoff. When I was drafting the post, I started with an analogy with "medium" takeoff (on the time scale of months), but later updated towards the slow takeoff scenario being a better match. The pandemic response in 2020 (since covid became apparent as a threat) is most relevant for the medium takeoff analogy, while the general level of readiness for a coronavirus pandemic prior to 2020 is most relevant for the slow takeoff analogy. I agree with Ben's response [https://www.lesswrong.com/posts/wTKjRFeSjKLDSWyww/possible-takeaways-from-the-coronavirus-pandemic-for-slow-ai?commentId=YcRJtpNHf2ewLCtt8] to your comment. Covid did not spring into existence in a world where pandemics are irrelevant, since there have been many recent epidemics and experts have been sounding the alarm about the next one. You make a good point that epidemics don't gradually increase in severity, though I think they have been increasing in frequency and global reach as a result of international travel, and the possibility of a virus escaping from a lab also increases the chances of encountering more powerful pathogens in the future. Overall, I agree that we can probably expect AI systems to increase in competence more gradually in a slow takeoff scenario, which is a reason for optimism. Your objections to the parallel with covid not being taken seriously seem reasonable to me, and I'm not very confident in this analogy overall. However, one could argue that the experience with previous epidemics should have resulted in a stronger prior on pandemics being a serious threat. I think it was clear from the outset of the covid epidemic that it's much more contagious than seasonal flu, which should have produced an update towards it being a serious threat as well. I agree that the direct economic effects of advanced AI would be obvious to observers, but I don't think this wo
4Ben Pace1ySome good points, but on the contrary: a slow take-off is considered safer because we have more lead time and warning shots, but the world has seen many similar events and warning shots for covid. Ones that come to mind in the last two decades are swine flu, bird flu, and Ebola, and of course there have been many more over history. This just isn’t that novel or surprising, billionaires like Bill Gates have been sounding the alarm, and still the supermajority of Western countries failed to take basic preventative measures. Those properties seem similar to even the slow take-off scenario. I feel like the fast-takeoff analogy would go through most strongly in a world where we'd just never seen this sort of pandemic before, but in reality we've seen many of them.
An Analytic Perspective on AI Alignment
weaker claim?

Oops yes. That's the weaker claim, that I agree with. The stronger claim is that because we can't understand something "all at once" then mechanistic transparency is too hard and so we shouldn't take Daniel's approach. But the way we understand laptops is also in a mechanistic sense. No one argues that because laptops are too hard to understand all at once, then we should't try to understand them mechanistically.

This seems to be assuming that we have to be able to take any complex trained AGI-as-a-neural-net
... (read more)
2Rohin Shah2yOkay, I think I see the miscommunication. The story you have is "the developers build a few small neural net modules that do one thing, mechanistically understand those modules, then use those modules to build newer modules that do 'bigger' things, and mechanistically understand those, and keep iterating this until they have an AGI". Does that sound right to you? If so, I agree that by following such a process the developer team could get mechanistic transparency into the neural net the same way that laptop-making companies have mechanistic transparency into laptops. The story I took away from this post is "we do end-to-end training with regularization for modularity, and then we get out a neural net with modular structure. We then need to understand this neural net mechanistically to ensure it isn't dangerous". This seems much more analogous to needing to mechanistically understand a laptop that "fell out of the sky one day" before we had ever made a laptop. My critiques are primarily about the second story. My critique of the first story would be that it seems like you're sacrificing a lot of competitiveness by having to develop the modules one at a time, instead of using end-to-end training.
An Analytic Perspective on AI Alignment
I'd be shocked if there was anyone to whom it was mechanistically transparent how a laptop loads a website, down to the gates in the laptop.

Could you clarify why this is an important counterpoint. It seems obviously useful to understand mechanistic details of a laptop in order to debug it. You seem to be arguing the [ETA: weaker] claim that nobody understands the an entire laptop "all at once", as in, they can understand all the details in their head simultaneously. But such an understanding is almost never possible for any complex system, a... (read more)

2Rohin Shah2yweaker claim? This seems to be assuming that we have to be able to take any complex trained AGI-as-a-neural-net and determine whether or not it is dangerous. Under that assumption, I agree that the problem is itself very hard, and mechanistic transparency is not uniquely bad relative to other possibilities. But my point is that because it is so hard to detect whether an arbitrary neural net is dangerous, you should be trying to solve a different problem. This only depends on the claim that mechanistic transparency is hard in an absolute sense, not a relative sense (given the problem it is trying to solve). Relatedly, from Evan Hubinger [https://www.lesswrong.com/posts/J9D6Bi3eFDDhCaovi/will-transparency-help-catch-deception-perhaps-not?commentId=yn5YcLnL6vs6AxxAE] : All of the other stories for preventing catastrophe that I mentioned in the grandparent are tackling a hopefully easier problem than "detect whether an arbitrary neural net is dangerous".
Cortés, Pizarro, and Afonso as Precedents for Takeover

For my part, I think you summarized my position fairly well. However, after thinking about this argument for another few days, I have more points to add.

  • Disease seems especially likely to cause coordination failures since it's an internal threat rather than an external threat (which unlike internal threats, tend to unite empires). We can compare the effects of the smallpox epidemic in the Aztec and Inca empires alongside other historical diseases during wartime, such as the Plauge of Athens which arguably is what caused Athens to lose the Peloponnesia
... (read more)
2Daniel Kokotajlo2yI accept that these points are evidence in your favor. Here are some more of my own: --Smallpox didn't hit the Aztecs until Cortes had already killed the Emperor and allied with the Tlaxcalans, if I'm reading these summaries correctly. (I really should go read the actual books...) So it seems that Cortes did get really far on the path towards victory without the help of disease. More importantly, there doesn't seem to be any important difference in how people treated Cortes before or after the disease. They took him very seriously, underestimated him, put too much trust in him, allied with him, etc. before the disease was a factor. --When Pizarro arrived in Inca lands, the disease had already swept through, if I'm reading these stories right. So the period of most chaos and uncertainty was over; people were rebuilding and re-organizing. --Also, it wasn't actually a 90% reduction in population. It was more like a 50% reduction at the time, if I am remembering right. (Later epidemics would cause further damage, so collectively they were worse than any other plague in history.) This is comparable to e.g. the Black Death in Europe, no? But the Black Death didn't result in the collapse of most civilizations who went through it, nor did it result in random small groups of adventurers taking over governments, I predict. (I haven't actually read up on the history of it)
Cortés, Pizarro, and Afonso as Precedents for Takeover
Later, other Europeans would come along with other advantages, and they would conquer India, Persia, Vietnam, etc., evidence that while disease was a contributing factor (I certainly am not denying it helped!) it wasn't so important a factor as to render my conclusion invalid (my conclusion, again, is that a moderate technological and strategic advantage can enable a small group to take over a large region.)

Europeans conquered places such as India, but that was centuries later, after they had a large technological advantage, and they also didn't ... (read more)

1Daniel Kokotajlo2yThe vast armadas were the result of successful colonization, not the cause of it. For example, a key battle that the British EIC won (enabling them to take over their first major territory) was the battle of Plassey, and they were significantly outnumbered during it. Fair point about the large technological advantage, but... actually it still wasn't that large? I don't know, I'd have to look into it more, but my guess is that the tech advantage of the EIC over the Nawab at Plassey, to use the same example, was smaller than the tech advantage of Cortes and Pizarro over the Americans. I should go find out how many men the EIC had when it conquered India. I'm betting that the answer is "Far fewer than India had." And also, yeah, didn't the British steal rocket technology from India? (Mysore, I think?) That's one military important technology that they were actually behind in.
Cortés, Pizarro, and Afonso as Precedents for Takeover
I really don't think the disease thing is important enough to undermine my conclusion. For the two reasons I gave: One, Afonso didn't benefit from disease

This makes sense, but I think the case of Afonso is sufficiently different from the others that it's a bit of a stretch to use it to imply much about AI takeovers. I think if you want to make a more general point about how AI can be militarily successful, then a better point of evidence is a broad survey of historical military campaigns. Of course, it's still a historically interesting... (read more)

2Daniel Kokotajlo2yAgain, I certainly agree that it would be good to think about things that could cause disarray as well. Like you said, maybe an AI could easily arrange for there to be a convenient pandemic at about the time it makes its move... And yeah, in light of your pushback I'm thinking of moderating my thesis to add the "disarray background condition" caveat. (I already edited the OP)This does weaken the claim, but not much, I think, because the sort of disarray needed is relatively common, I think. For purposes of Cortes and Pizarro takeover, what mattered was that they were able to find local factions willing to ally with them to overthrow the main power structures. The population count wasn't super relevant because, disease or no, it was several orders of magnitude more than Cortez & Pizarro had. And while it's true that without the disease they may have had a harder time finding local factions willing to ally with them, it's not obviously true, and moreover there are plenty of ordinary circumstances (ordinary civil wars, ordinary periods of unrest and rebellion, ordinary wars between great powers) that lead to the same result: Local factions being willing to ally with an outsider to overthrow the main power structure. This conversation has definitely made me less confident in my conclusion. I now think it would be worth it for me (or someone) to go do a bunch of history reading, to evaluate these debates with more information.
Cortés, Pizarro, and Afonso as Precedents for Takeover
I agree that it would be good to think about how AI might create devastating pandemics. I suspect it wouldn't be that hard to do, for an AI that is generally smarter than us. However, I think my original point still stands.

It's worth clarifying exactly what "original point" stands because I'm currently unsure.

I don't get why you think a small technologically primitive tribe could take over the world if they were immune to disease. Seems very implausible to me.

Sorry, I meant to say, "Were immune to diseases that were curre... (read more)

1Daniel Kokotajlo2yMy original point was that sometimes, a small group can reliably take over a large region despite being vastly outnumbered and outgunned, having only slightly better tech and cunning, knowing very little about the region to be conquered, and being disunited. This is in the context of arguments about how much of a lead in AI tech one needs to have to take over the world, and how big of an entity one needs to be to do it (e.g. can a rogue AI do it? What about a corporation? A nation-state?) Even with your point about disease, it still seems I'm right about this, for reasons I've mentioned (the 90% argument) I really don't think the disease thing is important enough to undermine my conclusion. For the two reasons I gave: One, Afonso didn't benefit from disease, and two, the 90% argument: Suppose there was no disease but instead the Aztecs and Incas were 90% smaller in population and also in the middle of civil war. Same result would have happened, and it still would have proved my point. I don't think a group of Incans in Spain could have taken it over if 90% of the Spaniards were dying of disease. I think they wouldn't have had the technology or experience necessary to succeed.
Cortés, Pizarro, and Afonso as Precedents for Takeover

Here's what I'll be putting in the Alignment Newsletter about this piece. Let me know if you spot inaccuracies or lingering disagreement regarding the opinion section.

Summary:

This post lists three historical examples of how small human groups conquered large parts of the world, and shows how they are arguably precedents for AI takeover scenarios. The first two historical examples are the conquests of American civilizations by Hernán Cortés and Francisco Pizarro in the early 16th century. The third example is the Portugese capture of key
... (read more)
1Daniel Kokotajlo2yThanks! Well, I still disagree with your opinion on it, for reasons mentioned above. To the point about "only" conquering ports, well, I think my explanations fit fine with that too -- the technological and experience advantages that (I claim) enabled Afonso to win were primarily naval in nature. Later, other Europeans would come along with other advantages, and they would conquer India, Persia, Vietnam, etc., evidence that while disease was a contributing factor (I certainly am not denying it helped!) it wasn't so important a factor as to render my conclusion invalid (my conclusion, again, is that a moderate technological and strategic advantage can enable a small group to take over a large region.)
Cortés, Pizarro, and Afonso as Precedents for Takeover

[ETA: Another way of framing my disagreement is that if you are trying to argue that small groups can take over the world, it seems almost completely irrelevant to focus on relative strategic or technological advantages in light of these historical examples. For instance, it could have theoretically been that some small technologically primitive tribe took over the world if they had some sort of immunity to disease. This would seem to imply that relative strategic advantages in Europeans vs. Americans was not that important. Instead we should focus on what... (read more)

2Daniel Kokotajlo2yI agree that it would be good to think about how AI might create devastating pandemics. I suspect it wouldn't be that hard to do, for an AI that is generally smarter than us. However, I think my original point still stands. I don't get why you think a small technologically primitive tribe could take over the world if they were immune to disease. Seems very implausible to me. What difference does it make whether he conquered civilizations or ports? He did a lot of conquering despite being vastly outnumbered. This shows that "on paper" stats like army size are not super useful for determining who is likely to win a fight, at least when one side has a tech+strategic advantage. (Also, Malacca at least was a civilization in its own right; it was a city-state with a much bigger population and military than Afonso had.) I agree that successful military campaigns are common in history. I think sometimes they can be attributed to luck, or else to genius. I chose these three case studies because they are so close to each other in time and space that they didn't seem like they could be luck or genius. I admit, however, that as lucy.ea8 said in their comment, perhaps cortes+pizarro won due to disease and then we can say Afonso was lucky or genius without stretching credibility. But I don't want to do this yet, because it seems to me that even with disease factored in, "most" of the "credit" for Cortes and Pizarro's success goes to the factors I mentioned. After all, suppose the disease reduced the on-paper strength of the Americans by 90%. They were still several orders of magnitude stronger than Cortes and Pizarro. So it's still surprising that Cortes/Pizarro won... until we factor in the technological and strategic advantages I mentioned. But the civilizations wouldn't have been destroyed without the Spaniards. (I might be wrong about this, but... hadn't the disease mostly swept through Inca territory by the time Pizarro arrived? So clearly their civilization had surviv
Cortés, Pizarro, and Afonso as Precedents for Takeover

Very interesting post! However, I have a big disagreement with your interpretation of why the European conquerors succeeded in America, and I think that it undermines much of your conclusion.

In your section titled "What explains these devastating takeovers?" you cite technology and strategic ability, but Old World diseases destroyed the communities in America before the European invaders arrived, most notably smallpox, but also measles, influenza, typhus and the bubonic plague. My reading of historians (from Charles Mann's book 1493, to Alfr... (read more)

This is a good critique; thank you.

I have two responses, and then a few nitpicks.

First response: Disease wasn't a part of Afonso's success. It helped the Europeans take over the Americas but did not help them take over Africa or Asia or the middle east; this suggests to me that it may have been a contributing factor but was not the primary explanation / was not strictly necessary.

Second response: Even if we decide that Cortes and Pizarro wouldn't have been able to succeed without the disease, my overall conclusion still stands. This is beca... (read more)

Coherence arguments do not entail goal-directed behavior

See also Alex Turner's work on formalizing instrumentally convergent goals, and his walkthrough of the MIRI paper.

1Issa Rice2yCan you say more about Alex Turner's formalism? For example, are there conditions in his paper or post similar to the conditions I named for Theorem 2 above? If so, what do they say and where can I find them in the paper or post? If not, how does the paper avoid the twitching robot from seeking convergent instrumental goals?
An Analytic Perspective on AI Alignment
That's not what I said.

That's fair. I didn't actually quite understand what your position was and was trying to clarify.

An Analytic Perspective on AI Alignment
I think it's plausible that there will be a simple basin that we can regularise an AGI into, because I have some ideas about how to do it, and because the world hasn't thought very hard about the problem yet (meaning the lack of extant solutions is to some extent explained away).

That makes sense. More pessimistically, one could imagine that the reason why no one has thought very hard about it is because in practice, it doesn't really help you that much to have a mechanistic understanding of a neural network in order to do useful work. Though... (read more)

2DanielFilan2yFWIW I take this work on 'circuits' in an image recognition CNN [https://distill.pub/2020/circuits/zoom-in/] to be a bullish indicator for the possibility of mechanistic transparency.
1DanielFilan2yI think I just think the 'market' here is 'inefficient'? Like, I think this just isn't a thing that people have really thought of, and those that have have gained semi-useful insight into neural networks by doing similar things (e.g. figuring out that adding a picture of a baseball to a whale fin will cause a network to misclassify the image as a great white shark [https://distill.pub/2019/activation-atlas/]). It also seems to me that recognition tasks (as opposed to planning/reasoning tasks) are going to be the hardest to get this kind of mechanistic transparency for, and also the kinds of tasks where transparency is easiest and ML systems are best. I think I understand what you mean here, but also think that there can be tricks that reduce computational cost that have some sort of mathematical backbone - it seems to me that this is common in the study of algorithms. Note also that we don't have to understand all possible real-world intelligent machines, just the ones that we build, making the requirement less stringent.
2DanielFilan2yI'll just respond to the easy part of this for now. That's not what I said. Because it takes ages to scroll down to comments and I'm on my phone, I can't easily link to the relevant comments, but basically I said that rationality is probably as formalisable as electromagnetism, but that theories as precise as that of liberalism can still be reasoned about and built on.
An Analytic Perspective on AI Alignment

I greatly appreciate writing your thoughts up. I have a few questions about your agenda/optimism regarding particular approaches.

The type of transparency that I’m most excited about is mechanistic, in a sense that I’ve described elsewhere.

Let me know if you'd agree with the following. The mechanistic approach is about understanding the internal structure of a program and how it behaves on arbitrary inputs. Mechanistic transparency is quite different from the more typical meaning of interpretability where we would like to know why an AI d... (read more)

2DanielFilan2yI agree with your sentence about the mechanistic approach. I think the word "interpretable" has very little specific meaning, but most work is about particular inputs. I agree that your examples divide up into what I would consider mechanistically transparent vs not, depending on exactly how large the decision tree, but I can't speak to whether they all count as "interpretable". I think it's plausible that there will be a simple basin that we can regularise an AGI into, because I have some ideas about how to do it, and because the world hasn't thought very hard about the problem yet (meaning the lack of extant solutions is to some extent explained away). I also think that there exists a relatively simple mathematical backbone to intelligence to be found (but not that all intelligent systems have this backbone), because I think promising progress has been made in mathematising a bunch of relevant concepts (see probability theory, utility theory, AIXI, reflective oracles). But this might be a bias from 'growing up' academically in Marcus Hutter's lab. You haven't deployed a system, don't know the kinds of situations it might encounter, and want reason to believe that it will perform well (e.g. by not trying to kill everyone) in these situations that you can't simulate. That being said, I have the feeling that this answer isn't satisfactorily detailed, so maybe you want more detail, or are thinking of a critique I haven't thought of? In this situation, the first answer is more likely to reveal some specific high-level mistakes the player might make, and provides affordance for a chess player to give advice for how to improve. The second answer seems like it's more amenable to mathematical analysis, generalises better across boards, less likely to be confabulated, and provides a better handle for how to directly improve the algorithm (basically, read forward more than one move). So I guess the first answer better reveals chess mistakes, and the second better reveals
[AN #80]: Why AI risk might be solved without additional intervention from longtermists
see above about trying to conform with the way terms are used, rather than defining terms and trying to drag everyone else along.

This seems odd given your objection to "soft/slow" takeoff usage and your advocacy of "continuous takeoff" ;)

2Rohin Shah2yI don't think "soft/slow takeoff" has a canonical meaning -- some people (e.g. Paul) interpret it as not having discontinuities, while others interpret it as capabilities increasing slowly past human intelligence over (say) centuries (e.g. Superintelligence). If I say "slow takeoff" I don't know which one the listener is going to hear it as. (And if I had to guess, I'd expect they think about the centuries-long version, which is usually not the one I mean.) In contrast, I think "AI risk" has a much more canonical meaning, in that if I say "AI risk" I expect most listeners to interpret it as accidental risk caused by the AI system optimizing for goals that are not our own. (Perhaps an important point is that I'm trying to communicate to a much wider audience than the people who read all the Alignment Forum posts and comments. I'd feel more okay about "slow takeoff" if I was just speaking to people who have read many of the posts already arguing about takeoff speeds.)
[AN #80]: Why AI risk might be solved without additional intervention from longtermists
Does this make sense to you?

Yeah that makes sense. Your points about "bio" not being short for "biological" were valid, but the fact that as a listener I didn't know that fact implies that it seems really easy to mess up the language usage here. I'm starting to think that the real fight should be about using terms that aren't self explanatory.

Have you actually observed it being used in ways that you fear (and which would be prevented if we were to redefine it more narrowly)?

I'm not sure about whether it would have be... (read more)

[AN #80]: Why AI risk might be solved without additional intervention from longtermists

I agree that this is troubling, though I think it's similar to how I wouldn't want the term biorisk to be expanded to include biodiversity loss (a risk, but not the right type), regular human terrorism (humans are biological, but it's a totally different issue), zombie uprisings (they are biological, but it's totally ridiculous), alien invasions etc.

Not to say that's what you are doing with AI risk. I'm worried about what others will do with it if the term gets expanded.

2Wei Dai2yWell as I said, natural language doesn't have to be perfectly logical, and I think "biorisk" is in somewhat in that category but there's an explanation that makes it a bit reasonable than it might first appear, which is that the "bio" refers not to "biological" but to "bioweapon". This is actually one of the definitions that Google gives [https://www.google.com/search?q=bio-] when you search for "bio": "relating to or involving the use of toxic biological or biochemical substances as weapons of war. 'bioterrorism'" I guess the analogous thing would be if we start using "AI" to mean "technical AI accidents" in a bunch of phrases, which feels worse to me than the "bio" case, maybe because "AI" is a standalone word/acronym instead of a prefix? Does this make sense to you? But the term was expanded from the beginning. Have you actually observed it being used in ways that you fear (and which would be prevented if we were to redefine it more narrowly)?
Load More