The Alignment Newsletter #6: 05/14/18

Rohin Shah

Highlights

Thoughts on AI Safety via Debate (Vaniver): Vaniver has played several debate games on the website and wrote up some of his experiences. He ended up more optimistic about debate, but still worries that the success of the technique relies on the toy examples being toy.

My opinion: I haven't played the particular debate game that OpenAI released, and so it was interesting to see what sort of strategies emerged. It was initially quite unintuitive to me how debate picks out a particular path in an argument tree, and I think if reading about particular concrete examples (as in this post) would have helped.

Prerequisities: AI safety via debate

Technical AI alignment

Problems

Classiﬁcation of global catastrophic risks connected with artiﬁcial intelligence (Alexey Turchin et al)

Scalable oversight

Thoughts on AI Safety via Debate (Vaniver): Summarized in the highlights!

Thoughts on "AI safety via debate" (gworley)

Miscellaneous (Alignment)

Open question: are minimal circuits daemon-free? (Paul Christiano): One issue that may arise with an advanced AI agent is that during training we may end up with a part of the AI system developing into a "daemon" -- a consequentialist agent that is optimizing a different goal. This goal may useful as a subcomponent for our AI, but the daemon may grow in power and end up causing the system to optimize for the subgoal. This could lead to catastrophic outcomes, even if we have specified a reward function that encodes human values to the top-level AI.

In this post, Paul suggests that these issues would likely go away if we choose the fastest program to solve our subgoal. Intuitively, for any daemon that arises as a solution to our problem, for it to cause a bad outcome it must be carrying out complicated reasoning to figure out whether or not to solve the problem honestly or to try to mislead us, and so we could get a faster program by just not doing that part of the computation. He proposes a particular formalization and poses it as an open question -- if we always choose the minimal (in size) boolean circuit that solves our problem, can a daemon ever arise?

My opinion: I still don't know what to think about daemons -- they do seem to be a problem in Solomonoff induction, but they seem unlikely to arise in the kinds of neural nets we have today (but could arise in larger ones). I would love to see more clarity around daemons, especially since the vast majority of current research would not solve this problem, since it is a problem with the training process and not the training signal.

Prerequisities: Optimization daemons

AI strategy and policy

To stay ahead of Chinese AI, senators want new commission (Aaron Mehta)

AI capabilities

Deep learning

Dynamic Control Flow in Large-Scale Machine Learning (Yuan Yu et al)

Exploring the Limits of Weakly Supervised Pretraining (Dhruv Mahajan et al)

News

Self-driving cars are here (Andrew Ng): Drive.ai will offer a self-driving car service for public use in Frisco, Texas starting in July, 2018. The post goes into details of how the cars will be rolled out, and some plans for how to make them easier for humans to interact with.