Read the associated paper "Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?": https://arxiv.org/abs/2407.21792 Focus on safety problems that aren’t solved with scale. Benchmarks are crucial in ML to operationalize the properties we want models to have (knowledge, reasoning, ethics, calibration, truthfulness, etc.). They act as a criterion to judge...