Training Process Transparency through Gradient Interpretability: Early experiments on toy language models — AI Alignment Forum