x

AI ALIGNMENT FORUM

AF

jiaxin wen — AI Alignment Forum

jiaxin wen

jiaxin wen

Message

17

1

2y

jiaxin wen

17

2y

Fabien's Shortform

Thanks for sharing! I'm a bit surprised that sleeper agent is listed as the best demo (e.g., higher than alignment faking). Do you focus on the main idea instead of specific operationalization here -- asking because I think backdoored/password-locked LMs could be quite different from real-world threat models.