These are notes taken during a call with Itay Yona, an expert in software/hardware reverse engineering (SRE). Itay gave me an excellent distillation of key ideas and mindsets in the field, and we discussed analogies/disanalogies to mechanistic interpretability of neural networks. I’m generally very excited to learn about other fields of study that reverse engineer complex systems, and what relevant insights they may have (SRE, neuroscience, systems biology, etc). All mistakes are mine, and all insights are his!
For more discussion on similarities, see Chris Olah’s essay Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases
The distinction between "newbies get caught up trying to understand every detail, experts think in higher-level abstractions, make educated guesses, and only zoom in on the details that matter" felt super interesting and surprising to me.
I claim that this is 1) an instance of a common pattern that 2) is currently missing a step (the pre-newbie stage).
The general pattern is the following (terminology borrowed from Terry Tao):
I think that many experts mainly notice the 2->3 transition, but not the 1->2 one, and so often dissuade newbies by encouraging them to not work in the rigorous stage. I claim this is really, really bad, and that a solid understanding of the rigorous stage is a good idea for ~everyone doing technical work.
Here's a few examples:
I like the analogy! I hadn't explicitly made the connection, but strongly agree (both that this is an important general phenomena, and that it specifically applies here). Though I'm pretty unsure how much I/other MI researchers are in 1 vs 3 when we try to reason about systems!
To be clear, I definitely do not want to suggest that people don't try to rigorously reverse engineer systems a bunch, and be super details oriented. Linked to your comment in the post.