This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Transformers
•
Applied to
Transformers Represent Belief State Geometry in their Residual Stream
by
Adam Shai
3d
ago
•
Applied to
Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?
by
right..enough?
7d
ago
•
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
18d
ago
•
Applied to
Decompiling Tracr Transformers - An interpretability experiment
by
Hannes Thurnherr
24d
ago
•
Applied to
Modern Transformers are AGI, and Human-Level
by
jacobjacob
25d
ago
•
Applied to
Deconfusing In-Context Learning
by
Arjun Panickssery
2mo
ago
•
Applied to
Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search
by
Arjun Panickssery
2mo
ago
•
Applied to
Attention SAEs Scale to GPT-2 Small
by
robertzk
3mo
ago
•
Applied to
Striking Implications for Learning Theory, Interpretability — and Safety?
by
Roger Dearnaley
3mo
ago
•
Applied to
AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them
by
Roman Leventov
4mo
ago
•
Applied to
Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained
by
Zeping Yu
4mo
ago
•
Applied to
Has anyone experimented with Dodrio, a tool for exploring transformer models through interactive visualization?
by
Bill Benzon
4mo
ago
•
Applied to
The Method of Loci: With some brief remarks, including transformers and evaluating AIs
by
Bill Benzon
5mo
ago
•
Applied to
New Tool: the Residual Stream Viewer
by
Adam Yedidia
7mo
ago
•
Applied to
World, mind, and learnability: A note on the metaphysical structure of the cosmos [& LLMs]
by
Bill Benzon
7mo
ago
•
Applied to
Google DeepMind's RT-2
by
SandXbox
8mo
ago
•
Applied to
The positional embedding matrix and previous-token heads: how do they actually work?
by
Adam Yedidia
8mo
ago
•
Applied to
How LLMs are and are not myopic
by
janus
9mo
ago
•
Applied to
GPT-2's positional embedding matrix is a helix
by
Adam Yedidia
9mo
ago
•
Applied to
Killing Recurrent Memory Over Self Attention?
by
Del Nobolo
10mo
ago