x
Reward IS the Optimization Target — AI Alignment Forum