Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities
Produced as part of the MATS Program Summer 2024 Cohort. The project is supervised by Marius Hobbhahn and Jérémy Scheurer Update: See also our paper on this topic admitted to the NeurIPS 2024 SoLaR Workshop. Introduction To mitigate risks from future AI systems, we need to assess their capabilities accurately....
Jul 22, 202469