The Open Philanthropy has just launched a large new Request for Proposals for technical AI safety research. Here we're sharing a reference guide, created as part of that RFP, which describes what projects we'd like to see across 21 research directions in technical AI safety.
This guide provides an opinionated overview of recent work and open problems across areas like adversarial testing, model transparency, and theoretical approaches to AI alignment. We link to hundreds of papers and blog posts and offer approximately a hundred different example projects. We hope this is a useful resource for technical people getting started in alignment research. We'd also welcome feedback from the LW community on our prioritization within or across research areas.
For each research area, we include:
Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality.
Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025.
We're seeking proposals across 21 different research areas, organized into five broad categories: