After a year of negotiation, the NSF has announced a $20 million request for proposals for empirical AI safety research.

Here is the detailed program description.


The request for proposals is broad, as is common for NSF RfPs. Many safety avenues, such as transparency and anomaly detection, are in scope:

  • "reverse-engineering, inspecting, and interpreting the internal logic of learned models to identify unexpected behavior that could not be found by black-box testing alone"
  • "Safety also requires... methods for monitoring for unexpected environmental hazards or anomalous system behaviors, including during deployment."


Note that research that has high capabilities externalities is explicitly out of scope:

"Proposals that increase safety primarily as a downstream effect of improving standard system performance metrics unrelated to safety (e.g., accuracy on standard tasks) are not in scope."


Thanks to OpenPhil for funding a portion the RfP---their support was essential to creating this opportunity!

New Comment