This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
MATS Program
•
Applied to
End-to-end hacking with language models
by
Timothée Chauvin
21d
ago
•
Applied to
Ophiology (or, how the Mamba architecture works)
by
Danielle Ensign
23d
ago
•
Applied to
My MATS Summer 2023 experience
by
James Chua
1mo
ago
•
Applied to
Understanding SAE Features with the Logit Lens
by
Joseph Isaac Bloom
2mo
ago
•
Applied to
MATS AI Safety Strategy Curriculum
by
Ryan Kidd
2mo
ago
•
Applied to
We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To
by
robertzk
2mo
ago
•
Applied to
Implementing activation steering
by
Annah
3mo
ago
•
Applied to
Attention SAEs Scale to GPT-2 Small
by
robertzk
3mo
ago
•
Applied to
How important is AI hacking as LLMs advance?
by
artkpv
3mo
ago
•
Applied to
Uncertainty in all its flavours
by
Cleo Nardo
3mo
ago
•
Applied to
Sparse Autoencoders Work on Attention Layer Outputs
by
robertzk
3mo
ago
•
Applied to
Case Studies in Reverse-Engineering Sparse Autoencoder Features by Using MLP Linearization
by
Jacob Dunefsky
4mo
ago
•
Applied to
Steering Llama-2 with contrastive activation additions
by
Alex Turner
4mo
ago
•
Applied to
Interview: Applications w/ Alice Rigg
by
jacobhaimes
4mo
ago
•
Applied to
MATS Summer 2023 Retrospective
by
Ryan Kidd
5mo
ago
•
Applied to
Classifying representations of sparse autoencoders (SAEs)
by
Annah
5mo
ago
•
Applied to
Game Theory without Argmax [Part 2]
by
Cleo Nardo
5mo
ago
•
Applied to
Game Theory without Argmax [Part 1]
by
Cleo Nardo
5mo
ago
•
Applied to
Polysemantic Attention Head in a 4-Layer Transformer
by
Jett Janiak
6mo
ago