AI Evaluations

AI Evaluations, Evaluations or "Evals", focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based.

Current challenges in AI evaluations include:

  • developing a method-agnostic standard to demonstrate sufficient understanding of a model,
  • ensuring that the level of understanding is adequate to catch dangerous failure modes, and
  • finding the right balance between behavioral and understanding-based evaluations.

