In this post, I provide some counterarguments to the arguments in Part 1. I then introduce two additional reasons why one may believe existential threats from AI systems are unlikely. 

Recap

In my first post, I outlined 5 ½ reasons why someone may believe the probability of AI X-Risk over the next few decades is low if we continue current approaches to training AI systems. To recap, these were:

  • Superintelligent AI won’t pursue a goal that results in harm to humans: the framing of a superintelligent AI as a goal-directed agent that is willing to sacrifice much of what humans care about is somehow flawed.
  • The current deep learning paradigm lacks a necessary ingredient: how we currently train large ML models will not result in superintelligence as we are missing some critical ingredients, such as “embodiment” or certain types of sense data.
  • We will run into resource constraints before we reach superintelligent AI: we will run out of high-quality human-annotated data or compute.
  • There are economic disincentives for developing dangerous AI: people will stop developing increasingly powerful AI as GDP growth will be bottlenecked by hard-to-automate things such as fine motor skills or human connection.
  • We will get nice AI by default: the dominant ML paradigm (pre-train a large model on an unsupervised predict-the-next-datum task, finetune/do RL with some human-labeled data) will result in an AI that agrees with our values to a sufficient extent.
  • Bonus: AI takeover is good: if AI wipes out humanity, it will likely be a worthy “descendant,” a natural continuation of human evolution.

Reasons to be skeptical

I find the resource constraint argument the strongest of the arguments above. There is a high level of uncertainty about the compute requirements for superintelligent AI, and also, there are no guarantees that Moore’s Law will continue. 

I think it’s sensible to be much less than 100% (or even 50%) certain that AI will cause an existential catastrophe, largely due to some of the above arguments and the additional arguments I will add. 

However, if you use any, or a combination, of these arguments to claim that AI X-risk in the next few decades is “highly unlikely” (let’s define this as below 1%), this seems suspect. I am skeptical of the strong form of these anti-X-risk arguments for the following reasons:

Superintelligent AI won’t pursue a goal that results in harm to humans

  • Instrumental convergence: a system that can behave like a goal-directed agent seems like an attractor state of a training process that searches for an AI that is particularly good at any given task.
  • Side effects: a powerful system that can affect the world state to a significant extent will not automatically know what side effects we would consider undesirable, and encoding all of them into an AI is a hard problem.
  • Misuse: a malicious actor could leverage its abilities to achieve a negative goal even if superintelligent AI is not goal-directed by default.
  • Dangerous inputs: even if, in most situations, the AI does not behave like a dangerous goal-directed agent, there could be some inputs that trigger such behavior, for instance, by getting the model to simulate a malign agent.

The current deep learning paradigm lacks a necessary ingredient

  • Internet corpora contain an immense amount of information about the world, and capturing additional sense data using cameras and sensors is reasonably easy.
  • We are not yet training on all the image and video data on the internet.
  • There has been a lot of success using AI-generated data for training. This could be a good way of increasing the signal-to-noise ratio of the training data (as raw scraped content can be very noisy and repetitive). 
  • Given enough optimization pressure, the world's physical properties can be inferred from the text and image data we use for training models today.
  • Millions of scientific research papers (2.3 million on ArXiv) can be used in the AIs training data to facilitate a deep understanding of human science. 

We will run into resource constraints before we reach superintelligent AI

  • Progress is being made on developing more resource-efficient training methods in industry and the open source community.
  • Hardware is still improving, with companies trying to develop more optimized chips for training ML models.
  • The capabilities of a model can be increased by augmenting it with additional memory, tools such as a code interpreter or internet browsing, and fine-tuning / better prompt engineering. 
  • A large proportion of the GDP has not yet been diverted to training AI models, but it could be if more companies start believing this is an economically favorable strategy.

There are economic disincentives for developing dangerous AI

  • The fact that certain tasks will be significantly harder to automate, resulting in bottlenecks, does not necessarily imply a lack of incentive to continue scaling. In many cases, physical and intellectual work is fungible. For instance, AIs can make manual labor more efficient by inventing better machinery and manufacturing techniques.  
  • Investors seem willing to pour huge sums into training more powerful AIs without guaranteeing particular results. For example, Inflection AI, a recent startup, raised 1.3 billion dollars to build a more advanced chatbot. 
  • An AI that does not unlock transformative GDP growth can still be dangerous.

We will get nice AI by default

  • The more powerful a system, the more a small mistake can have a large negative impact. 
  • Even if many human concepts are automatically encoded in AIs trained per the standard paradigm, this does not mean an artificial system will aim for the same goals and values as humans by default. 
  • Optimizing for human values at above-human intelligence is not necessarily well described by the training data fed to ML algorithms.
  • Training corpora have many examples of bad and immoral behavior. Therefore AIs will be capable of simulating immoral agents.

Two more reasons why AI X-risk could be less likely

Alignment will succeed

This argument claims that although superintelligent AI may be dangerous by default, we will make sufficient progress in alignment research and determine how to build aligned AI.

Reasons to believe we will be likely to succeed include:

Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence

  • We may find a way to use weaker systems to supervise and align stronger ones, and it may be easier to align the weaker ones.
  • We could find a setup where human feedback is used and generalized more effectively and robustly.
  • Any, or a combination of, the existing alignment proposals could work sufficiently well. This could be particularly likely if you believe that some of the work is done by default, e.g., if AIs trained on human-generated data already encode most of the concepts we care about in an easy-to-map way. 

We should acknowledge uncertainty more

Prior probabilities should dominate our predictions because we are reasoning under high uncertainty. We have not yet built the types of AI systems that could pose an existential threat, so we cannot collect empirical data. Many of the models and arguments used when reasoning about AI could end up being flawed in some way due to misunderstandings about the nature of more powerful intelligences, theory of deep learning, or some other unknown unknowns. 

If the evidence for and against X-Risk is seen as very weak, the dominant factor in your estimate is your prior, uninformed belief. Of course, the question is, “What is the correct reference class for advanced AI”? Suppose you look at base rates of transformative technologies causing significant harm to humanity. In this case, we can observe technological breakthroughs, in general, improve the average quality of life greatly, despite fearmongering at the time of invention. However, it’s hard to assess to what extent there have been counterfactual catastrophes. For example, there were instances when a nuclear war could have been triggered, and the snapshot decisions of a few individuals saved us. Then, on the other hand, technologies developed specifically as weapons could be outside the reference class. 

You could also operate with a reference class of “what happens to dumber things when more intelligent things appear” and observe that humans have not wiped out chimps, and chimps have not wiped out ants. Humans have indeed caused the extinction of some species, such as the Dodo, Tasmanian Tiger, and Steller's Sea Cow. However, most of these instances were caused by hunting, and people have become more interested in animal wellbeing and preserving biodiversity as the average intelligence and education level has risen.  

Fake “reasons” why AI X-risk is unlikely

Of course, some people dismiss concerns around AI X-risk for reasons that aren’t worth writing about, as they are so obviously flawed and don’t get at the core claims being made about AI risk in the first place. For completeness, I will give a few examples, but I’m not in the business of collecting more of these:

  • AI has short-term risks, such as people losing their jobs (is this bad?), biased outputs, and more fake news/disinformation. (Sure, however, it’s unclear why people use this to argue against X-risk)
  • AI X-risk sounds like science fiction.
  • Arguments of the kind “I don’t like the type of people who care about AI X-risk.”

New to LessWrong?

New Comment