(This has been sitting in my drafts folder since August 2017. Robin Hanson's recent How Lumpy AI Services? made me think of it again. I'm not sure why I didn't post it back then. I may have wanted to add more reasons, details and/or citations, but at this point it seems better to just post it as is. Apologies to those who may have come up with some of these arguments earlier.)

Robin Hanson recently wrote, "Recently AI risk has become something of an industry, with far more going on than I can keep track of. Many call working on it one of the most effectively altruistic things one can possibly do. But I’ve searched a bit and as far as I can tell that foom scenario is still the main reason for society to be concerned about AI risk now." (By "foom scenario" he means a local intelligence explosion where a single AI takes over the world.) In response, I list the following additional reasons to work urgently on AI alignment.

  1. Property rights are likely to not hold up in the face of large capability differentials between humans and AIs, so even if the intelligence explosion is likely global as opposed to local, that doesn't much reduce the urgency of working on AI alignment.

  2. Making sure an AI has aligned values and strong controls against value drift is an extra constraint on the AI design process. This constraint appears likely to be very costly at both design and run time, so if the first human level AIs deployed aren't value aligned, it seems very difficult for aligned AIs to catch up and become competitive.

  3. AIs' control of the economy will grow over time. This may happen slowly in their time frame but quickly in ours, leaving little time to solve value alignment problems before human values are left with a very small share of the universe, even if property rights hold up.

  4. Once we have human-level AIs and it's really obvious that value alignment is difficult, superintelligent AIs may not be far behind. Superintelligent AIs can probably find ways to bend people's beliefs and values to their benefit (e.g., create highly effective forms of propaganda, cults, philosophical arguments, and the like). Without an equally capable, value-aligned AI to protect me, even if my property rights are technically secure, I don't know how I would secure my mind.

New Comment
24 comments, sorted by Click to highlight new comments since: Today at 2:12 AM

Even if there will be problems worth working on at some point, if we will know a lot more later and if resources today can be traded for a lot more resources later, the temptation to wait should be strong. The foom scenario has few visible indications of a problem looming, forcing one to work on the problems far ahead of time. But in scenarios where there's warning, lots more resources, and better tools and understand later, waiting makes a lot more sense.

Conditional on the nonfoom scenario, what is the appropriate indication that you should notice, to start converting resources into work?

If the world where there may or may not be a foom, how likely does foom need to be to make it correct to work on sooner?

I think the answer to the first question is that, as with every other (important) industry, the people in that industry will have the time and skill to notice the problems and start working on them. The FOOM argument says that a small group will form a singleton quickly, and so we need to do something special to ensure it goes well, and the non-FOOM argument is that AI is an industry like most others, and like most others it will not take over the world in a matter of months.

Where do you draw the line between "the people in that industry will have the time and skill to notice the problems and start working on them" and what is happening now, which is: some people in the industry (at least, you can't argue DeepMind and OpenAI are not in the industry) noticed there is a problem and started working on it? Is it an accurate representation of the no-foom position to say, we should only start worrying when we literally observe a superhuman AI that is trying to take over the world? What if, AI takes years to gradually push humans to the sidelines, but the process in unstoppable because this time is not enough to solve alignment from scratch and the economic incentives to keep employing and developing AI are too strong to fight against?

Solving problems is mostly a matter of total resources devoted, not time devoted. Yes some problems have intrinsic clocks, but this doesn't look like such a problem. If we get signs of a problem looming, and can devote a lot of resources then, that makes it tempting to save resources today for such a future push, as we'll know a lot more then and resources today become more resources when delayed.

Solving problems is mostly a matter of total resources devoted, not time devoted. ... If we get signs of a problem looming, and can devote a lot of resources then.

Hmm. I don't have as strong opinions about this, but this premise doesn't seem obviously true.

I'm thinking about the "is science slowing down?" question – pouring 1000x resources into various scientific fields didn't result in 1000x speedups. In some cases progress seemed to slow down. The three main hypotheses I have are:

  • Low hanging fruit got used up, so the problems got harder
  • Average careerist scientists don't matter much, only extremely talented, naturally motivated researchers matter. The naturally motivated researchers will do the work anyway.
  • Coordination is hard and scales with the number of people coordinating. If you have 1000x the researchers in a field, they can't find each other's best work that easily.

I agree that "time spent" isn't the best metric, but it seems like what actually matters is "quality researcher hours that build on each other in the right way," and it's not obvious how much you can scale that.

If it's just the low hanging fruit hypothesis then... that's fine I guess. But if the "extreme talent/motivation" or "coordination" issues are at play, then you want (respectively) to ensure that:

a) at any given time, talented people who are naturally interested in the problem have the freedom to work on it, if there are nonzero things to do with it, since there won't be that many of them in the future.

b) build better coordination tools so that people in the future can scale their efforts better.

(You may also want to make efforts not to get mediocre careerist scientists involved in the field)

FWIW another reason, somewhat similar to the low hanging fruit point, is that because the remaining problems are increasingly specialized, they require more years' training before you can tackle them. I.e. not just harder to solve once you've started, but it takes longer for someone to get to the point where they can even start.

Also, I wonder if the increasing specialization means there are more problems to solve (albeit ever more niche), so people are being spread thinner among them. (Though conversely there are more people in the world, and many more scientists, than a century or two ago.)

I think that this problem is in the same broad category as "invent general relativity" or "prove the Poincare conjecture". That is, for one thing quantity doesn't easily replace talent (you couldn't invent GR just as easily with 50 mediocre physicists instead of one Einstein), and, for another thing, the work is often hard to parallelize (50 Einsteins wouldn't invent GR 50 times as fast). So, you can't solve it just by spending lots of resources in a short time frame.

Yeah, I agree with this view and I believe it's the most common view among MIRI folks.

In software development, a perhaps relevant kind of problem solving, extra resources in the form of more programmers working on the same project doesn't speed things up much. My guesstimate is output = time x log programmers. I assume the main reason being because there's a limit to the extent that you can divide a project into independent parallel programming tasks. (Cf 9 women can't make a baby in 1 month.)

Except that if the people are working in independent smaller teams, each trying to crack the same problem, and *if* the solution requires a single breakthrough (or a few?) which can be made by a smaller team (e.g. public key encryption, as opposed to landing a man on the moon), then presumably it's proportional to the number of teams, because each has an independent probability of making the breakthrough. And it seems plausible that solving AI threats might be more like this.

If you agree that there will be problems worth working on at some point, then when to start working on them becomes a judgement call about how hard the problems are, which warning sign will leave enough time to solve them, how much better tools and understanding will get in the future (without us working specifically to improve such tools/understanding), and how current resources trade against future resources. If you agree with this, I suggest that another reason for urgency besides foom is a judgment that we've already passed such warning signs where it becomes worthwhile to work on the problems. (There are people such as Paul Christiano who don't think foom is highly likely and almost certainly has a good understanding of the tradeoffs you bring up here, who nevertheless think it's urgent to work on alignment.) You might disagree with this judgment but it seems wrong to say "foom scenario is still the main reason for society to be concerned about AI risk now". (Unless you're saying something like, according to your own inside view, foom is the best argument for urgency on AI risk, but I'm assuming you're talking about other people's motivations?)

By "main reason for concern" I mean best arguments; I'm not trying to categorize people's motivations.

AGI isn't remotely close, and I just don't believe people who think they see signs of that. Yes for any problem that we'll eventually want to work on, a few people should work on it now just so someone is tracking the problem, ready to tell the rest of us if they see signs of it coming soon. But I see people calling for much more than that minimal tracking effort.

Most people who work in research areas call for more relative funding for their areas. So the rest of us just can't be in the habit of believing such calls. We must hold a higher standard than "people who get $ to work on this say more $ should go to this now."

AGI isn’t remotely close, and I just don’t believe people who think they see signs of that.

You don't seem to believe in foom either, but you're at least willing to mention it as a reason some people give for urgency and even engage in extensive debates about it. I don't understand how "no foom, but AGI may be close enough that it's worthwhile to do substantial alignment work now" could be so much less likely in your mind than foom that it's not even worth mentioning as a reason that some other (seemingly smart and sane) people give for urgency.

Most people who work in research areas call for more relative funding for their areas. So the rest of us just can’t be in the habit of believing such calls.

What do you propose that "the rest of us" do? I guess some of us can try to evaluate the object-level arguments ourselves, but what about those who lack the domain knowledge or even the raw intelligence to do that? (This is not a rhetorical question; I actually don't know.)

We must hold a higher standard than “people who get $ to work on this say more $ should go to this now.”

I'm pretty sure Paul can make more money by going into some other line of work than AI safety, plus he's actually spending his own money to fund AI alignment research by others. I personally do not get $ to work on this (except by winning some informal prizes funded by Paul which far from covers the value of time I've spent on the topic) and I plan to keep it that way for the foreseeable future. (Of course we're still fairly likely to be biased for other reasons.)

ETA, it looks like you added this part to your comment after I typed the above:

By “main reason for concern” I mean best arguments; I’m not trying to categorize people’s motivations.

Ok, that was not clear, since you did present a Twitter poll in the same post asking about "motives for AI risk concern".

Can you point to a good/best argument for the claim that AGI is coming soon enough to justify lots of effort today?

I'm not actually aware of a really good argument for AGI coming soon (i.e., within next few decades). As far as I can tell, most people use their own intuitions and/or surveys of AI researchers (both of which are of course likely to be biased). My sense is that it's hard to reason explicitly about AGI timelines (in a way that's good enough to be more trustworthy than intuitions/surveys) and there seem to be enough people concerned about foom and/or short timelines that funding isn't a big constraint so there's not a lot of incentives for AI risk people to spend time on making such explicit arguments. (ETA: Although I could well be wrong about this, and there's a good argument somewhere that I'm not aware of.) To give a sense of how people are thinking about this, I'll quote a Paul Christiano interview:

I normally think about this question in terms of what’s the probability of some particular development by 10 or 20 years rather than thinking about a median because those seem like the most decision relevant numbers, basically. Maybe one could also, if you had very short timelines give probabilities on less than 10 years. I think that my probability for human labor being obsolete within 10 years is probably something in the ballpark of 15%, and within 20 years is something within the ballpark of 35%. AI would then have, prior to human labor being obsolete, you have some window of maybe a few years during which stuff is already getting quite extremely crazy. Probably AI [risk 01:09:04] becomes a big deal. We can have permanently have sunk the ship like somewhat before, one to two years before, we actually have human labor being obsolete.

Those are my current best guesses. I feel super uncertain about … I have numbers off hand because I’ve been asked before, but I still feel very uncertain about those numbers. I think it’s quite likely they’ll change over the coming year. Not just because new evidence comes in, but also because I continue to reflect on my views. I feel like a lot of people, whose views I think are quite reasonable, who push for numbers both higher and lower, or there are a lot of people making reasonable arguments for numbers both much, like shorter timelines than that and longer timelines than that.

Overall, I come away pretty confused with why people currently are as confident as they are in their views. I think compared to the world at large, the view I’ve described is incredibly aggressive, incredibly soon. I think compared to the community of people who think about this a lot, I’m more somewhere in, I’m still on the middle of the distribution. But amongst people whose thinking I most respect, maybe I’m somewhere in the middle of the distribution. I don’t quite understand why people come away with much higher or much lower numbers than that. I don’t have a good … It seems to me like the arguments people are making on both sides are really quite shaky. I can totally imagine that after doing … After being more thoughtful, I would come away with higher or lower numbers, but I don’t feel convinced that people who are much more confident one way or the other have actually done the kind of analysis that I should defer to them on. That’s said, I also I don’t think I’ve done the kind of analysis that other people should really be deferring to me on.

My own thinking here is that even if AGI comes a century or more from now, the safest alignment approaches seem to require solving a number of hard philosophical problems which may well take that long to solve even if we start now. Certainly it would be pretty hopeless if we only started when we saw a clear 10-year warning. This possibility also justifies looking more deeply into other approaches now to see if they could potentially be just as safe without solving the hard philosophical problems.

Another thought that is prompted by your question is that given funding does not seem to be the main constraint on current alignment work (people more often cite "talent"), it's not likely to be a limiting constraint in the future either, when the warning signs are even clearer. But "resources today can be traded for a lot more resources later" doesn't seem to apply if we interpret "resources" as "talent".

We have to imagine that we have some influence over the allocation of something, or there's nothing to debate here. Call it "resources" or "talent" or whatever, if there's nothing to move, there's nothing to discuss.

I'm skeptical solving hard philosophical problems will be of much use here. Once we see the actual form of relevant systems then we can do lots of useful work on concrete variations.

I'd call "human labor being obsolete within 10 years … 15%, and within 20 years … 35%" crazy extreme predictions, and happily bet against them.

If we look at direct economic impact, we've had a pretty steady trend for at least a century of jobs displaced by automation, and the continuation of past trend puts full AGI a long way off. So you need a huge unprecedented foom-like lump of innovation to have change that big that soon.

We have to imagine that we have some influence over the allocation of something, or there’s nothing to debate here. Call it “resources” or “talent” or whatever, if there’s nothing to move, there’s nothing to discuss.

Let me rephrase my argument to be clearer. You suggested earlier, "and if resources today can be traded for a lot more resources later, the temptation to wait should be strong." This advice could be directed at either funders or researchers (or both). It doesn't seem to make sense for researchers, since they can't, by not working on AI alignment today, cause more AI alignment researchers to appear in the future. And I think a funder should think, "There will be plenty of funding for AI alignment research in the future when there are clearer warning signs. I could save and invest this money, and spend it in the future on alignment, but it will just be adding to the future pool of funding, and the marginal utility will be pretty low because at the margins, it will be hard to turn money into qualified alignment researchers in the future just as it is hard to do that today."

So I'm saying this particular reallocation of resources that you suggested does not make sense, but the money/talent could still be reallocated some other way (for example to some other altruistic cause today). Do you have either a counterargument or another suggestion that you think is better than spending on AI alignment today?

I’m skeptical solving hard philosophical problems will be of much use here.

Have you seen my recent posts that argued for or supported this? If not I can link them: Three AI Safety Related Ideas, Two Neglected Problems in Human-AI Safety, Beyond Astronomical Waste, The Argument from Philosophical Difficulty.

Once we see the actual form of relevant systems then we can do lots of useful work on concrete variations.

Sure, but why can't philosophical work be a complement to that?

I’d call “human labor being obsolete within 10 years … 15%, and within 20 years … 35%” crazy extreme predictions, and happily bet against them.

I won't defend these numbers because I haven't put much thought into this topic personally (since my own reasons don't depend on these numbers, and I doubt that I can do much better than deferring to others). But at what probabilities would you say that substantial work on alignment today would start to be worthwhile (assuming the philosophical difficulty argument doesn't apply)? What do you think a world where such probabilities are reasonable would look like?

If there is a 50-50 chance of foom vs non-foom, and in the non-foom scenario we expect to acquire enough evidence to get an order of magnitude more funding, then to maximize the chance of a good outcome we, today, should invest in the foom scenario because the non-foom scenario can be handled by more reluctant funds.

Related, on the EA Forum. (I am the post's author.)

Its not quite about "fast" v. "slow" than about the chances for putting lots of resources into the problem with substantial warning. Even if things change fast, as long as you get enough warning and resources can be moved to the problem fast enough, waiting still makes sense.

[-]TAG5y20

Making sure an AI has aligned values and strong controls against value drift is an extra constraint on the AI design process. This constraint appears likely to be very costly at both design and run time, so if the first human level AIs deployed aren’t value aligned, it seems very difficult for aligned AIs to catch up and become competitive

Making sure that an AI has good enough controllability is very much part of the design process, because a completely uncontrollable AI is no good to anyone.

Full value alignment is different and probably much more difficult. There is a hard and an easy control problem.

... even if my property rights are technically secure, I don't know how I would secure my mind.

Training up one's concentration & present-moment awareness are probably helpful for this.

Re: scenario 3, see The Evitable Conflict, the last story in Isaac Asimov's "I, Robot":

"Stephen, how do we know what the ultimate good of Humanity will entail? We haven't at our disposal the infinite factors that the Machine has at its! Perhaps, to give you a not unfamiliar example, our entire technical civilization has created more unhappiness and misery than it has removed. Perhaps an agrarian or pastoral civilization, with less culture and less people would be better. If so, the Machines must move in that direction, preferably without telling us, since in our ignorant prejudices we only know that what we are used to, is good – and we would then fight change. Or perhaps a complete urbanization, or a completely caste-ridden society, or complete anarchy, is the answer. We don't know. Only the Machines know, and they are going there and taking us with them."

I'm not sure I understand the point of this quote in relation to what I wrote. (Keep in mind that I haven't read the story, in case the rest of the story offers the necessary context.) One guess is that you're suggesting that AIs might be more moral than humans "by default" without special effort on the part of effective altruists, so it might not be an existential disaster if AI values end up controlling most of the universe instead of human values. This seems somewhat plausible but surely isn't a reasonable mainline expectation?