This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Nicholas Goldowsky-Dill
Interpretability Researcher at Apollo Research
Posts
Sorted by New
54
A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
9d
6
42
Apollo Research 1-year update
2mo
0
55
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
2mo
1
28
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
2mo
5
22
Causal scrubbing: results on induction heads
2y
0
20
Causal scrubbing: results on a paren balance checker
2y
2
11
Causal scrubbing: Appendix
2y
1
99
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
2y
29
Wiki Contributions
Comments