This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Sparse Autoencoders (SAEs)
•
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan Chen
5d
ago
•
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan Chen
5d
ago
•
Applied to
Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders
by
PaulPauls
18d
ago
•
Applied to
Analyzing how SAE features evolve across a forward pass
by
bensenberner
1mo
ago
•
Applied to
SAEs are highly dataset dependent: a case study on the refusal direction
by
Connor Kissane
1mo
ago
•
Applied to
Evolutionary prompt optimization for SAE feature visualization
by
neverix
1mo
ago
•
Applied to
SAE Probing: What is it good for? Absolutely something!
by
Subhash Kantamneni
1mo
ago
•
Applied to
A suite of Vision Sparse Autoencoders
by
Louka Ewington-Pitsos
1mo
ago
•
Applied to
SAEs you can See: Applying Sparse Autoencoders to Clustering
by
Robert_AIZI
2mo
ago
•
Applied to
On the Practical Applications of Interpretability
by
Ruben Bloom
2mo
ago
•
Applied to
It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
by
Gerard Boxo
2mo
ago
•
Applied to
Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
by
Kola Ayonrinde
2mo
ago
•
Applied to
SAE features for refusal and sycophancy steering vectors
by
neverix
2mo
ago
•
Applied to
HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
by
Jaehyuk Lim
2mo
ago
•
Applied to
Domain-specific SAEs
by
jacob_drori
2mo
ago
•
Applied to
An X-Ray is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation
by
hugofry
2mo
ago
•
Applied to
Exploring SAE features in LLMs with definition trees and token lists
by
mwatkins
2mo
ago
•
Applied to
Interpretability of SAE Features Representing Check in ChessGPT
by
Jonathan Kutasov
2mo
ago
•
Applied to
Toy Models of Feature Absorption in SAEs
by
chanind
2mo
ago
•
Applied to
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
by
Raymond Arnold
2mo
ago