By Owain Evans, with input from Jacob Hilton, Dan Hendrycks, Asya Bergal, Owen Cotton-Barratt, and Rohin Shah Summary We would like to see research aimed at creating advanced AI systems that are highly competent and do not lie to humans. To operationalize this goal, we introduce the concept of “truthful”...
Chris Olah wrote the following topic prompt for the Open Phil 2021 request for proposals on the alignment of AI systems. We (Asya Bergal and Nick Beckstead) are running the Open Phil RFP and are posting each section as a sequence on the Alignment Forum. Although Chris wrote this document,...
By Ajeya Cotra Training powerful models to maximize simple metrics (such as quarterly profits) could be risky. Sufficiently intelligent models could discover strategies for maximizing these metrics in perverse and unintended ways. For example, the easiest way to maximize profits may turn out to involve stealing money, manipulating whoever keeps...
By Jacob Steinhardt, with input from Beth Barnes As machine learning pervades more and more sectors of society, it brings with it many benefits, but also poses risks, especially as systems become more powerful and difficult to understand and control. It is important to understand these risks as well as...
As part of our work on reducing potential risks from advanced artificial intelligence, Open Philanthropy is seeking proposals for projects working with deep learning systems that could help us understand and make progress on AI alignment: the problem of creating AI systems more capable than their designers that robustly try...
Open Philanthropy is planning a request for proposals (RFP) for AI alignment projects working with deep learning systems, and we’re looking for feedback on the RFP and on the research directions we’re looking for proposals within. We’d be really interested in feedback from people on the Alignment Forum on the...