https://slatestarcodex.com/2019/08/27/book-review-reframing-superintelligence/

(archive)


Drexler asks: what if future AI looks a lot like current AI, but better?
For example, take Google Translate. A future superintelligent Google Translate would be able to translate texts faster and better than any human translator, capturing subtleties of language beyond what even a native speaker could pick up. It might be able to understand hundreds of languages, handle complicated multilingual puns with ease, do all sorts of amazing things. But in the end, it would just be a translation app. It wouldn’t want to take over the world. It wouldn’t even “want” to become better at translating than it was already. It would just translate stuff really well.
...
In this future, our AI technology would have taken the same path as our physical technology. The human body can run fast, lift weights, and fight off enemies. But the automobile, crane, and gun are three different machines. Evolution had to cram running-ability, lifting-ability, and fighting-ability into the same body, but humans had more options and were able to do better by separating them out. In the same way, evolution had to cram book-writing, technology-inventing, and strategic-planning into the same kind of intelligence – an intelligence that also has associated goals and drives. But humans don’t have to do that, and we probably won’t. We’re not doing it today in 2019, when Google Translate and AlphaGo are two different AIs; there’s no reason to write a single AI that both translates languages and plays Go. And we probably won’t do it in the superintelligent future either. Any assumption that we will is based more on anthropomorphism than on a true understanding of intelligence.
These superintelligent services would be safer than general-purpose superintelligent agents. General-purpose superintelligent agents (from here on: agents) would need a human-like structure of goals and desires to operate independently in the world; Bostrom has explained ways this is likely to go wrong. AI services would just sit around algorithmically mapping inputs to outputs in a specific domain.

A takeaway:


I think Drexler’s basic insight is that Bostromian agents need to be really different from our current paradigm to do any of the things Bostrom predicts. A paperclip maximizer built on current technology would have to eat gigabytes of training data about various ways people have tried to get paperclips in the past so it can build a model that lets it predict what works. It would build the model on its actually-existing hardware (not an agent that could adapt to much better hardware or change its hardware whenever convenient). The model would have a superintelligent understanding of the principles that had guided some things to succeed or fail in the training data, but wouldn’t be able to go far beyond them into completely new out-of-the-box strategies. It would then output some of those plans to a human, who would look them over and make paperclips 10% more effectively.
The very fact that this is less effective than the Bostromian agent suggests there will be pressure to build the Bostromian agent eventually (Drexler disagrees with this, but I don’t understand why). But this will be a very different project from AI the way it currently exists, and if AI the way it currently exists can be extended all the way to superintelligence, that would give us a way to deal with hostile superintelligences in the future.

New to LessWrong?

New Comment
9 comments, sorted by Click to highlight new comments since: Today at 8:38 PM

"Ten years ago, everyone was talking about superintelligence, the singularity, the robot apocalypse. What happened?"

What is this referencing? I was only 10 years old in 2009 but I have a strong impression that AI risk gets a lot more attention now than it did then.


Also, what are the most salient differences between CAIS and the cluster of concepts Karnofsky and others were calling "Tool AI"?

It might also be worth comparing CAIS and "tool AI" to Paul Christiano's IDA and the desiderata MIRI tends to talk about (task-directed AGI [1,2,3], mild optimization, limited AGI).

At a high level, I tend to think of Christiano and Drexler as both approaching alignment from very much the right angle, in that they're (a) trying to break apart the vague idea of "AGI reasoning" into smaller parts, and (b) shooting for a system that won't optimize harder (or more domain-generally) than we need for a given task. From conversations with Nate, one way I'd summarize MIRI-cluster disagreements with Christiano and Drexler's proposals is that MIRI people don't tend to think these proposals decompose cognitive work enough. Without a lot more decomposition/understanding, either the system as a whole won't be capable enough, or it will be capable by virtue of atomic parts that are smart enough to be dangerous, where safety is a matter of how well we can open those black boxes.

In my experience people use "tool AI" to mean a bunch of different things, including things MIRI considers very important and useful (like "only works on a limited task, rather than putting any cognitive work into more general topics or trying to open-endedly optimize the future") as well as ideas that don't seem relevant or that obscure where the hard parts of the problem probably are.

A lot of the distinction between a service and an agent seems to rest on the difference between thinking and doing. Is there a well-defined concept of action for intelligent agents?

AGIs will have a causal model of the world. If their own output is part of that model, and they work forward from there to the real-world consequences of their outputs, and they choose outputs partly based on those consequences, then it's an agent by (my) definition. The outputs are called "actions" and the consequences are called "goals". In all other cases, then I'd call it a service, unless I'm forgetting about some edge cases.

A system whose only output is text on a screen can be either a service or an agent, depending on the computational process generating the text. A simple test is that if there's a weird, non-obvious way to manipulate the people reading the text (according the everyday, bad-connotation sense of "manipulate"), would the system take advantage of it? Agents would do so (by default, unless they had a complicated goal involving ethics etc.), services would not by default.

Nobody knows how to build a useful AI capable of world-modeling and formulating intelligent plans but which is not an agent, although I'm personally hopeful that it might be possible by self-supervised learning (cf. Self-Supervised Learning and AGI safety).

This sounds like we're resting on an abstract generalization of 'outputs.' Is there any work being done to distinguish between different outputs, and consider how a computer might recognize a kind it doesn't already have?

Right, I was using "output" in a broad sense of "any way that the system can causally impact the rest of the world". We can divide that into "intended output channels" (text on a screen etc.) and "unintended output channels" (sending out radio signals using RAM etc.). I'm familiar with a small amount of work on avoiding unintended output channels (e.g. using homomorphic encryption or fancy vacuum-sealed Faraday cage boxes).

Usually the assumption is that a superintelligent AI will figure out what it is, and where it is, and how it works, and what all its output channels are (both intended and unintended), unless there is some strong reason to believe otherwise (example). I'm not sure this answers your question ... I'm a bit confused at what you're getting at.

I am aiming directly at questions of how an AI that starts with a only a robotic arm might get to controlling drones or trading stocks, from the perspective of the AI. My intuition, driven by Moravec's Paradox, is that each new kind of output (or input) has a pretty hefty computational threshold associated with it, so I suspect that the details of the initial inputs/outputs will have a big influence on the risk any given service or agent presents.

The reason I am interested in this is that it feels like doing things has no intrinsic connection to learning things, and the we only link them because so much of our learning and doing is unconscious. That is to say, I suspect actions are orthogonal to intelligence.

Regarding "computational threshold", my working assumption is that any given capability X is either (1) always and forever out of reach of a system by design, or (2) completely useless, or (3) very likely to be learned by a system, if the system has long-term real-world goals. Maybe it takes some computational time and effort to learn it, but AIs are not lazy (unless we program them to be). AIs are just systems that make good decisions in pursuit of a goal, and if "acquiring capability X" is instrumentally helpful towards achieving goals in the world, it will probably make that decision if it can (cf. "Instrumental convergence").

If I have a life goal that is best accomplished by learning to use a forklift, I'll learn to use a forklift, right? Maybe I won't be very fluid at it, but fine, I'll operate it more slowly and deliberately, or design a forklift autopilot subsystem, or whatever...

A lot of the distinction between a service and an agent seems to rest on the difference between thinking and doing.

That doesn't seem right to me. There are several, potentially subtle differences between services and agents – the boundary (or maybe even 'boundaries') are probably nebulous at high resolution.

A good prototypical service is Google Translate. You submit text to it to translate and it outputs a translation as text. It's both thinking and doing but the 'doing' is limited – it just outputs translated text.

A good prototypical agent is AlphaGo. It pursues a goal, to win a game of Go, but does so in a (more) open-ended fashion than a service. It will continue to play as long as it can.

Down-thread, you wrote:

I am aiming directly at questions of how an AI that starts with a only a robotic arm might get to controlling drones or trading stocks, from the perspective of the AI.

I think one thing to point out up-front is that a lot of current AI systems are generated or built in a stage distinct from the stage in which they 'operate'. A lot of machine learning algorithms involve a distinct period of learning, first, which produces a model. That model can then be used – as a service. The model/service would do something like 'tell me if an image is of a hot dog'. Or, in the case of AlphaGo, something like 'given a game state X, what next move or action should be taken?'.

What makes AlphaGo an agent is that it's model is operated in a mode whereby it's continually fed a sequence of game states, and, crucially, both its output controls the behavior of a player in the game, and the next game state its given depends on it's previous output. It becomes embedded or embodied via the feedback between its output, player behavior, and its subsequence input, a game state that includes the consequences of its previous output.

But, we're still missing yet another crucial ingredient to make an agent truly (or at least more) dangerous – 'online learning'.

Instead of training a model/service all at once up-front, we could train it while it acts as an agent or service, i.e. 'online'.

I would be very surprised if an AI installed to control a robotic arm would gain control of drones or be able to trade stocks, but just because I would expect such an AI to not use online learning and to be overall very limited in terms of what inputs with which it's provided (e.g. the position of the arm and maybe a camera covering its work area) and what outputs to which it has direct access (e.g. a sequence of arm motions to be performed).

Probably the most dangerous kind of tool/service AI imagined is an oracle AI, i.e. an AI to which people would pose general open-ended questions, e.g. 'what should I do?'. For oracle AIs, I think some other (possibly) key dangerous ingredients might be present:

  • Knowledge of other oracle AIs (as a plausible stepping stone to the next ingredient)
  • Knowledge of itself as an oracle AI (and thus an important asset)
  • Knowledge of its own effects on the world, thru those that consult it, or those that are otherwise aware of its existence or its output