Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 months, and available funding for substantially more depending on application quality.
Applications (here) start with a simple 300 word expression of interest and are open until April 15, 2025.
Overview
We're seeking proposals across 21 different research areas, organized into five broad categories:
- Adversarial Machine Learning
- *Jailbreaks and unintentional misalignment
- *Control evaluations
- *Backdoors and other alignment stress tests
- *Alternatives to adversarial training
- Robust unlearning
- Exploring sophisticated misbehavior of LLMs
- *Experiments on alignment faking
- *Encoded reasoning in CoT and inter-model communication
- Black-box LLM psychology
- Evaluating whether models can hide dangerous behaviors
- Reward hacking of human oversight
- Model transparency
- Applications of white-box techniques
- Activation monitoring
- Finding feature
... (read 177 more words →)