Clarifying mesa-optimization
Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. Thanks to Jérémy Scheurer, Nicholas Dupuis and Evan Hubinger for feedback and discussion When people talk about mesa-optimization, they sometimes say things like “we’re searching for the optimizer module” or “we’re doing interpretability to find...
Mar 21, 202338