Mode collapse in RL may be fueled by the update equation — AI Alignment Forum