Multi-dimensional rewards for AGI interpretability and control — AI Alignment Forum