Oracle AI - AI Alignment Forum

An Oracle AI is a regularly proposed solution to the problem of developing Friendly AI. It is conceptualized as a super-intelligent system which is designed for only answering questions, and has no ability to act in the world. The name was first suggested by Nick Bostrom.

Safety

Armstrong, Sandberg and Bostrom discuss Oracle AI safety at length in their Thinking inside the box: using and controlling an Oracle AI. In the paper, the authors propose a generic theoretical conceptual architecture to create such a system and review various methods which might be used to measure the Oracle's accuracy. They also try to shed some light on some weaknesses and dangers that can emerge on the human side (such as psychological vulnerabilities which can be exploited by the Oracle through social engineering, for example). Some ideas for physical security – also known as “boxing” - are also discussed as well as which questions may be safe to ask, utility indifference, and many other factors. The paper’s conclusion that Oracles – or AI boxing concepts in general - are safer than fully free agent AIs has been a subject of debate for a long time.

In a related work, Dreams of Friendliness, Eliezer Yudkowsky gives an informal argument stating that all oracles will be agent-like, that is, driven by its own goals. He rests on the idea that anything considered "intelligent" must choose the correct course of action among all actions avaliable. That means that the Oracle will have many possible things to believe, although very few of them are correct. Therefore believing the correct thing means some method was used to select the correct belief from the many incorrect beliefs. By definition, this is an optimization process which has a goal of selecting correct beliefs....

(Read More)