Open Philanthropy is planning a request for proposals (RFP) for AI alignment projects working with deep learning systems, and we’re looking for feedback on the RFP and on the research directions we’re looking for proposals within. We’d be really interested in feedback from people on the Alignment Forum on the current (incomplete) draft of the RFP.
The main RFP text can be viewed here. It links to several documents describing two of the research directions we’re interested in:
- Measuring and forecasting risks
- Techniques for enhancing human feedback [Edit: this previously linked to an older, incorrect version]
Please feel free to comment either directly on the documents, or in the comments section below.
We are unlikely to add or remove research directions at this stage, but we are open to making any other changes, including to the structure of the RFP. We’d be especially interested in getting the Alignment Forum’s feedback on the research directions we present, and on the presentation of our broader views on AI alignment. It’s important to us that our writing about AI alignment is accurate and easy to understand, and that it’s clear how the research we’re proposing relates to our goals of reducing risks from power-seeking systems.