x

AI ALIGNMENT FORUM

AF

nps29 — AI Alignment Forum

nps29

nps29

Message

46

2y

nps29

46

2y

JumpReLU SAEs + Early Access to Gemma 2 SAEs

by Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár, and Neel Nanda

New paper from the Google DeepMind mechanistic interpretability team, led by Sen Rajamanoharan! We introduce JumpReLU SAEs, a new SAE architecture that replaces the standard ReLUs with discontinuous JumpReLU activations, and seems to be (narrowly) state of the art over existing methods like TopK and Gated SAEs for achieving high...

Jul 19, 2024•55