x
(Some) Natural Emergent Misalignment from Reward Hacking in Non-Production RL — AI Alignment Forum