METR: Measuring AI Ability to Complete Long Tasks — AI Alignment Forum