Can corrigibility be learned safely? — AI Alignment Forum