This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Interpretability (ML & AI)
•
Applied to
Composition Circuits in Vision Transformers (Hypothesis)
by
phenomanon
7h
ago
•
Applied to
SAE Probing: What is it good for? Absolutely something!
by
Subhash Kantamneni
11h
ago
•
Applied to
Bridging the VLM and mech interp communities for multimodal interpretability
by
Sonia Joseph
5d
ago
•
Applied to
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
by
Connor Kissane
5d
ago
•
Applied to
Enabling New Applications with Today's Mechanistic Interpretability Toolkit
by
ananya_joshi
8d
ago
•
Applied to
SAEs you can See: Applying Sparse Autoencoders to Clustering
by
Robert_AIZI
10d
ago
•
Applied to
Monosemanticity & Quantization
by
Rahul Chand
10d
ago
•
Applied to
A Rocket–Interpretability Analogy
by
plex
12d
ago
•
Applied to
A short project on Mamba: grokking & interpretability
by
Alejandro Tlaie Boria
15d
ago
•
Applied to
The Computational Complexity of Circuit Discovery for Inner Interpretability
by
Bogdan Ionut Cirstea
16d
ago
•
Applied to
Circuits in Superposition: Compressing many small neural networks into one
by
Lucius Bushnaq
19d
ago
•
Applied to
It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
by
Gerard Boxo
19d
ago
•
Applied to
Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
by
Kola Ayonrinde
20d
ago
•
Applied to
SAE features for refusal and sycophancy steering vectors
by
neverix
21d
ago
•
Applied to
EIS XIV: Is mechanistic interpretability about to be practically useful?
by
Stephen Casper
21d
ago
•
Applied to
Hamiltonian Dynamics in AI: A Novel Approach to Optimizing Reasoning in Language Models
by
Javier Marin Valenzuela
23d
ago
•
Applied to
There is a globe in your LLM
by
jacob_drori
25d
ago
•
Applied to
Domain-specific SAEs
by
jacob_drori
25d
ago
•
Applied to
Exploring SAE features in LLMs with definition trees and token lists
by
mwatkins
1mo
ago