Model-driven feedback could amplify alignment failures — AI Alignment Forum