Reward is not the optimization target — AI Alignment Forum