ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks — AI Alignment Forum