Nick_Beckstead — AI Alignment Forum

Truthful and honest AI

By Owain Evans, with input from Jacob Hilton, Dan Hendrycks, Asya Bergal, Owen Cotton-Barratt, and Rohin Shah Summary We would like to see research aimed at creating advanced AI systems that are highly competent and do not lie to humans. To operationalize this goal, we introduce the concept of “truthful”...

Oct 29, 202142

Interpretability

Chris Olah wrote the following topic prompt for the Open Phil 2021 request for proposals on the alignment of AI systems. We (Asya Bergal and Nick Beckstead) are running the Open Phil RFP and are posting each section as a sequence on the Alignment Forum. Although Chris wrote this document,...

Oct 29, 202161

Techniques for enhancing human feedback

By Ajeya Cotra Training powerful models to maximize simple metrics (such as quarterly profits) could be risky. Sufficiently intelligent models could discover strategies for maximizing these metrics in perverse and unintended ways. For example, the easiest way to maximize profits may turn out to involve stealing money, manipulating whoever keeps...

Oct 29, 202122

Measuring and forecasting risks

By Jacob Steinhardt, with input from Beth Barnes As machine learning pervades more and more sectors of society, it brings with it many benefits, but also poses risks, especially as systems become more powerful and difficult to understand and control. It is important to understand these risks as well as...

Oct 29, 202122

Request for proposals for projects in AI alignment that work with deep learning systems

As part of our work on reducing potential risks from advanced artificial intelligence, Open Philanthropy is seeking proposals for projects working with deep learning systems that could help us understand and make progress on AI alignment: the problem of creating AI systems more capable than their designers that robustly try...

Oct 29, 202187

Provide feedback on Open Philanthropy’s AI alignment RFP

Open Philanthropy is planning a request for proposals (RFP) for AI alignment projects working with deep learning systems, and we’re looking for feedback on the RFP and on the research directions we’re looking for proposals within. We’d be really interested in feedback from people on the Alignment Forum on the...

Aug 20, 202156