x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
wesg — AI Alignment Forum
Wes Gurnee
OR PhD student at MIT working on interpretability.
Find out more here: https://wesg.me/
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
77
Refusal in LLMs is mediated by a single direction
2y
44
52
SAE reconstruction errors are (empirically) pathological
2y
1
19
Finding Neurons in a Haystack: Case Studies with Sparse Probing
3y
1
Comments