x

AI ALIGNMENT FORUM

AF

Cailley Factor — AI Alignment Forum

Cailley Factor

Cailley Factor

Message

69

1y

Cailley Factor

69

1y

Selective Generalization: Improving Capabilities While Maintaining Alignment

by ariana_azarbal, Matthew A. Clarke, Jorio Cocola, Cailley Factor, and cloud

Ariana Azarbal*, Matthew A. Clarke*, Jorio Cocola*, Cailley Factor*, and Alex Cloud. *Equal Contribution. This work was produced as part of the SPAR Spring 2025 cohort. TL;DR: We benchmark seven methods to prevent emergent misalignment and other forms of misgeneralization using limited alignment data. We demonstrate a consistent tradeoff between...

Jul 16, 2025•82