x
On Interpretability's Robustness — AI Alignment Forum