x

AI ALIGNMENT FORUM

AF

Lewis Ho — AI Alignment Forum

Lewis Ho

Lewis Ho

Message

40

4y

Lewis Ho

40

4y

Evaluating and monitoring for AI scheming

by Vika, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, and Rohin Shah

As AI models become more sophisticated, a key concern is the potential for “deceptive alignment” or “scheming”. This is the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures put in place by humans to...

Jul 10, 2025•53