Do LLMs know what they're capable of? Why this matters for AI safety, and initial findings
This post is a companion piece to a forthcoming paper. This work was done as part of MATS 7.0 & 7.1. Abstract We explore how LLMs’ awareness of their own capabilities affects their ability to acquire resources, sandbag an evaluation, and escape AI control. We quantify LLMs' self-awareness of capability...
Jul 13, 202553