Conservation of Expected Ethics isn't enough

by Stuart Armstrong1 min read15th Jun 2016No comments


Personal Blog

An idea relevant for AI control; index here.

Thanks to Jessica Taylor.

I've been playing with systems to ensure or incentivise conservation of expected ethics - the idea that if an agent estimates that utilities and are (for instance) equally likely, then its future estimate for the correctness of and must be the same. In other words, it can try and get more information, but can't bias the direction of the update.

Unfortunately, CEE isn't enough. Here are a few decisions the AI can take that respect CEE. Imagine that the conditions of update relied on, for instance, humans answering questions:

#. Don't ask. #. Ask casually. #. Ask emphatically. #. Build a robot that randomly rewires humans to answer one way or the other. #. Build a robot that observes humans, figures out which way they're going to answer, then rewires them to answer the opposite way.

All of these conserve CEE, but, obviously, the last two options are not ideal...

1 comments, sorted by Highlighting new comments since Today at 9:18 AM
New Comment

I noticed that CEE is already named in philosophy. Conservation of expected ethics is roughly what what Artnzenius calls Weak Desire Reflection. He calls Conservation of expected evidence Belief Reflection. [1]

  1. Arntzenius, Frank. "No regrets, or: Edith Piaf revamps decision theory." Erkenntnis 68.2 (2008): 277-297.