Misalignment and misuse: whose values are manifest?

by KatjaGraceMeteuphoric1 min read13th Nov 20203 comments

14

AI
Frontpage

Crossposted from world spirit sock puppet.

AI related disasters are often categorized as involving misaligned AI, or misuse, or accident. Where:

  • misuse means the bad outcomes were wanted by the people involved,
  • misalignment means the bad outcomes were wanted by AI (and not by its human creators), and
  • accident means that the bad outcomes were not wanted by those in power but happened anyway due to error.

In thinking about specific scenarios, these concepts seem less helpful.

I think a likely scenario leading to bad outcomes is that AI can be made which gives a set of people things they want, at the expense of future or distant resources that the relevant people do not care about or do not own.

For example, consider autonomous business strategizing AI systems that are profitable additions to many companies, but in the long run accrue resources and influence and really just want certain businesses to nominally succeed, resulting in a worthless future. Suppose Bob is considering whether to get a business strategizing AI for his business. It will make the difference between his business thriving and struggling, which will change his life. He suspects that within several hundred years, if this sort of thing continues, the AI systems will control everything. Bob probably doesn’t hesitate, in the way that businesses don’t hesitate to use gas vehicles even if the people involved genuinely think that climate change will be a massive catastrophe in hundreds of years.

When the business strategizing AI systems finally plough all of the resources in the universe into a host of thriving 21st Century businesses, was this misuse or misalignment or accident? The strange new values that were satisfied were those of the AI systems, but the entire outcome only happened because people like Bob chose it knowingly (let’s say). Bob liked it more than the long glorious human future where his business was less good. That sounds like misuse. Yet also in a system of many people, letting this decision fall to Bob may well have been an accident on the part of others, such as the technology’s makers or legislators.

Outcomes are the result of the interplay of choices, driven by different values. Thus it isn’t necessarily sensical to think of them as flowing from one entity’s values or another’s. Here, AI technology created a better option for both Bob and some newly-minted misaligned AI values that it also created—‘Bob has a great business, AI gets the future’—and that option was worse for the rest of the world. They chose it together, and the choice needed both Bob to be a misuser and the AI to be misaligned. But this isn’t a weird corner case, this is a natural way for the future to be destroyed in an economy.

Thanks to Joe Carlsmith for conversation leading to this post.

AI2
Frontpage

14

3 comments, sorted by Highlighting new comments since Today at 4:16 AM
New Comment
  • misalignment means the bad outcomes were wanted by AI (and not by its human creators), and
  • accident means that the bad outcomes were not wanted by those in power but happened anyway due to error.

My impression was that accident just meant "the AI system's operator didn't want the bad thing to happen", so that it is a superset of misalignment.

Though I agree with the broader point that in realistic scenarios there is usually no single root cause to enable this sort of categorization.

I think that you have a 4th failure mode. Moloch.

Isn't it correct and useful to say that Bob used the misalignment in his favor? That doesn't sound exactly right, because that makes Bob looked like he gamed the system instead of dealing with a tradeoff. In that context, the misalignment let Bob attack the economy in a way that was useful for him.

  • misalignment means the bad outcomes were wanted by AI (and not by its human creators), and
  • accident means that the bad outcomes were not wanted by those in power but happened anyway due to error.

My own intuition for accident in this context is that the bad part of the outcome was irrelevant for the AI, so it didn't try to push in that direction, but it's other actions did accidentally.