AI Evaluations

AI Evaluations, or "Evals", focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based.

(note: initially written by GPT4, may contain errors despite a human review. Please correct them if you see them)

Current challenges in AI evaluations include:

  developing a method-agnostic standard to demonstrate sufficient understanding of a model,
  ensuring that the level of understanding is adequate to catch dangerous failure modes, and
  • finding the right balance between behavioral and understanding-based evaluations.

