This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Nicholas Goldowsky-Dill
Interpretability Researcher at Apollo Research
Posts
Sorted by New
54
A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team
4mo
7
42
Apollo Research 1-year update
5mo
0
56
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
5mo
1
28
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
6mo
8
22
Causal scrubbing: results on induction heads
2y
0
20
Causal scrubbing: results on a paren balance checker
2y
2
11
Causal scrubbing: Appendix
2y
1
103
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
2y
29
Wiki Contributions
Comments
Sorted by
Newest