I gave a talk at MIT in March earlier this year on barriers to mechanistic interpretability being helpful to AGI/ASI safety, and why by default it will likely be net dangerous. Several people seem to be coming to similar conclusions recently (e.g., this recent post).
I discuss two major points (by no means exhaustive), one technical and one political, that present barriers to MI addressing AGI risk:
This being said, there are more nuances to this opinion, and a lot of it is downstream of lack of coordination and the downsides of publishing in an adversarial environment like we are in right now. I still endorse the work done by e.g. Chris Olah's team as brilliant, but extremely early, scientific work that has a lot of steep epistemological hurdles to overcome, but I unfortunately also believe that on net work such as Olah's is at the moment more useful as a safety-washing tool for AGI labs like Anthropic than actually making a dent on existential risk concerns.
Here are the slides from my talk, and you can find the video here.