I occasionally have some thoughts about why AGI might not be as near as a lot of people seem to think, but I'm confused about how/whether to talk about them in public.

The biggest reason for not talking about them is that one person's "here is a list of capabilities that I think an AGI would need to have, that I don't see there being progress on" is another person's "here's a roadmap of AGI capabilities that we should do focused research on". Any articulation of missing capabilities that is clear enough to be convincing, seems also clear enough to get people thinking about how to achieve those capabilities.

At the same time, the community thinking that AGI is closer than it really is (if that's indeed the case) has numerous costs, including at least:

  • Immense mental health costs to a huge number of people who think that AGI is imminent
  • People at large making bad strategic decisions that end up having major costs, e.g. not putting any money in savings because they expect it to not matter soon
  • Alignment people specifically making bad strategic decisions that end up having major costs, e.g. focusing on alignment approaches that one might pay off in the long term and neglecting more foundational long-term research
  • Alignment people losing credibility and getting a reputation of crying wolf once predicted AGI advances fail to materialize

Having a better model of what exactly is missing could conceivably also make it easier to predict when AGI will actually be near. But I'm not sure to what extent this is actually the case, since the development of core AGI competencies feels more of a question of insight than grind[1], and insight seems very hard to predict.

A benefit from this that does seem more plausible would be if the analysis of capabilities gave us information that we could use to figure out what a good future landscape would look like. For example, suppose that we aren't likely to get AGI soon and that the capabilities we currently have will create a society that looks more like the one described in Comprehensive AI Services, and that such services could safely be used to detect signs of actually dangerous AGIs. If this was the case, then it would be important to know that we may want to accelerate the deployment of technologies that are taking in the world in a CAIS-like direction, and possibly e.g. promote rather than oppose things like open source LLMs.

One argument would be that if AGI really isn't near, then that's going to be obvious pretty soon, and it's unlikely that my arguments in particular for this would be all that unique - someone else would be likely to make them soon anyway. But I think this argument cuts both ways - if someone else is likely to make the same arguments soon anyway, then there's also limited benefit in writing them up. (Of course, if it saves people from significant mental anguish, even just making those arguments slightly earlier seems good, so overall this argument seems like it's weakly in favor of writing up the arguments.)

  1. ^

    From Armstrong & Sotala (2012)

    Some AI prediction claim that AI will result from grind: i.e. lots of hard work and money. Other claim that AI will need special insights: new unexpected ideas that will blow the field wide open (Deutsch 2012).

    In general, we are quite good at predicting grind. Project managers and various leaders are often quite good at estimating the length of projects (as long as they’re not directly involved in the project (Buehler, Griffin, and Ross 1994)). Even for relatively creative work, people have sufficient feedback to hazard reasonable guesses. Publication dates for video games, for instance, though often over-optimistic, are generally not ridiculously erroneous—even though video games involve a lot of creative design, play-testing, art, programing the game “AI,” etc. Moore’s law could be taken as an ultimate example of grind: we expect the global efforts of many engineers across many fields to average out to a rather predictable exponential growth.

    Predicting insight, on the other hand, seems a much more daunting task. Take the Riemann hypothesis, a well-established mathematical hypothesis from 1885, (Riemann 1859). How would one go about estimating how long it would take to solve? How about the P = NP hypothesis in computing? Mathematicians seldom try and predict when major problems will be solved, because they recognize that insight is very hard to predict. And even if predictions could be attempted (the age of the Riemann’s hypothesis hints that it probably isn’t right on the cusp of being solved), they would need much larger error bars than grind predictions. If AI requires insights, we are also handicapped by the fact of not knowing what these insights are (unlike the Riemann hypothesis, where the hypothesis is clearly stated, and only the proof is missing). This could be mitigated somewhat if we assumed there were several different insights, each of which could separately lead to AI. But we would need good grounds to assume that.

AI
Frontpage

27

New Answer
New Comment

2 Answers sorted by

Obviously I think it's worth being careful, but I think in general it's actually relatively hard to accidentally advance capabilities too much by working specifically on alignment. Some reasons:

  1. Researchers of all fields tend to do this thing where they have really strong conviction in their direction and think everyone should work on their thing. Convincing them that some other direction is better is actually pretty hard even if you're trying to shove your ideas down their throats.
  2. Often the bottleneck is not that nobody realizes that something is a bottleneck, but rather that nobody knows how to fix it. In these cases, calling attention to the bottleneck doesn't really speed things up, whereas for thinking about alignment we can reason about what things would look like if it were to be solved.
  3. It's generally harder to make progress on something by accident than to make progress on purpose on something if you try really hard to do it. I think this is true even if there is a lot of overlap. There's also an EMH argument one could make here but I won't spell it out.

I think the alignment community thinking correctly is essential for solving alignment. Especially because we will have very limited empirical evidence before AGI, and that evidence will not be obviously directly applicable without some associated abstract argument, any trustworthy alignment solution has to route through the community reasoning sanely.

Also to be clear I think the "advancing capabilities is actually good because it gives us more information on what AGI will look like" take is very bad and I am not defending it. The arguments I made above don't apply, because they basically hinge on work on alignment not actually advancing capabilities.

From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.

If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.

If we communicate the simple idea that AGI is not near then people will feel less need to work on safety soon. This would let them not miss out on opportunities that would be good to take ahead of when they actually need to focus on AI safety.

We can only really communicate one thing at a time to people. Also, we should worry more about tail risks a false positives (thinking we can build AGI safely when we cannot) than false negatives (thinking we can't build AGI safely when we can). Taking these two facts into consideration, I think the policy implication is clear: unless there is extremely strong evidence that AGI is not near, we must act and communicate as if AGI is near.

I reached this via Joachim pointing it out as an example of someone urging epistemic defection around AI alignment, and I have to agree with him there. I think the higher difficulty posed by communicating "we think there's a substantial probability that AGI happens in the next 10 years" vs "AGI is near" is worth it even from a PR perspective, because pretending you know the day and the hour smells like bullshit to the most important people who need convincing that AI alignment is nontrivial.

1Gordon Seidoh Worley5mo
I left a comment over in the other thread, but I think Joachim misunderstands my position. In the above comment I've taken for granted that there's a non-trivial possibility that AGI is near, so I'm not arguing we should say that "AGI is near" regardless of whether it is or not, because we don't know if it is or not, we only have our guesses about it, and so long as there's a non-trivial chance that AGI is near, I think that's the more important message to communicate. Overall it would be better if we can communicate something like "AGI is probably near", but "probably" and similar terms are going to get rounded off, so even if you do literally say "AGI is probably near" or similar, that's not what people will hear, and if you're going to say "probably" my argument is that it's better if they round the "probably" off to "near" rather than "not near".
2orthonormal5mo
I agree with "When you say 'there's a good chance AGI is near', the general public will hear 'AGI is near'". However, the general public isn't everyone, and the people who can distinguish between the two claims are the most important to reach (per capita, and possibly in sum). So we'll do better by saying what we actually believe, while taking into account that some audiences will round probabilities off (and seeking ways to be rounded closer to the truth while still communicating accurately to anyone who does understand probabilistic claims). The marginal gain by rounding ourselves off at the start isn't worth the marginal loss by looking transparently overconfident to those who can tell the difference.
2Joachim Bartosik5mo
I'm replying only here because spreading discussion over multiple threads makes it harder to follow. You left a reply on a question asking how to communicate about reasons why AGI might not be near. The question refers to costs of "the community" thinking that AI closer than it really is as a reason to communicate about reasons it might not be so close. So I understood the question as asking about communication with the community (my guess: of people seriously working and thinking about AI-safety-as-in-AI-not-killing-everyone). Where it's important to actually try to figure out truth. You replied (as I understand) that when we communicate to general public we can transmit only 1 idea that so we should communicate that AGI is near (if we assign not-very-low probability to that). I think the biggest problem I have with your posting "general public communication" as a reply to question asking about "community communication" pushes towards less clarity in the community, where I think clarity is important. I'm also not sold on the "you can communicate only one idea" thing but I mostly don't care to talk about it right now (it would be nice if someone else worked it out for me but now I don't have capacity to do it myself).
1Gordon Seidoh Worley5mo
Ah I see. I have to admit, I write a lot of my comments between things and I missed that the context of the post could cause my words to be interpreted this way. These days I'm often in executive mode rather than scholar mode and miss nuance if it's not clearly highlighted, hence my misunderstanding, but also reflects where I'm coming from with this answer!
2 comments, sorted by Click to highlight new comments since: Today at 12:32 AM

IME a lot of people's stated reasons for thinking AGI is near involve mistaken reasoning and those mistakes can be discussed without revealing capabilities ideas: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce

An alternative framing that might be useful: What do you see as the main bottleneck for people having better predictions of timelines (as you see it)?

Do you in fact think that having such a list is it?