x
Transformers Don't Need LayerNorm at Inference Time: Implications for Interpretability — AI Alignment Forum