AI ALIGNMENT FORUM
AF

Interpretability Research for the Most Important Century

Apr 25, 2022 by Evan R. Murphy

This series of posts attempts to answer the following question from Holden Karnofsky’s Important, actionable research questions for the most important century (from which the name of this sequence is inspired as well):

“What relatively well-scoped research activities are particularly likely to be useful for longtermism-oriented AI alignment?”.

As one answer to Holden's question, I explore the argument that interpretability research is one of these high-leverage activities in the AI alignment research space.

 

Featured image was created by DALL·E

4Introduction to the sequence: Interpretability Research for the Most Important Century
Evan R. Murphy
3y
0
23Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
Evan R. Murphy
3y
0