Discovering Language Model Behaviors with Model-Written Evaluations — AI Alignment Forum