AI ALIGNMENT FORUMTags
AF

Transformers

•

Applied to Transformers Represent Belief State Geometry in their Residual Stream by Adam Shai 3d ago

•

Applied to Barcoding LLM Training Data Subsets. Anyone trying this for interpretability? by right..enough? 7d ago

•

Applied to Understanding mesa-optimization using toy models by tilmanr 18d ago

•

Applied to Decompiling Tracr Transformers - An interpretability experiment by Hannes Thurnherr 24d ago

•

Applied to Modern Transformers are AGI, and Human-Level by jacobjacob 25d ago

•

Applied to Deconfusing In-Context Learning by Arjun Panickssery 2mo ago

•

Applied to Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search by Arjun Panickssery 2mo ago

•

Applied to Attention SAEs Scale to GPT-2 Small by robertzk 3mo ago

•

Applied to Striking Implications for Learning Theory, Interpretability — and Safety? by Roger Dearnaley 3mo ago

•

Applied to AGI will be made of heterogeneous components, Transformer and Selective SSM blocks will be among them by Roman Leventov 4mo ago

•

Applied to Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained by Zeping Yu 4mo ago

•

Applied to Has anyone experimented with Dodrio, a tool for exploring transformer models through interactive visualization? by Bill Benzon 4mo ago

•

Applied to The Method of Loci: With some brief remarks, including transformers and evaluating AIs by Bill Benzon 5mo ago

•

Applied to New Tool: the Residual Stream Viewer by Adam Yedidia 7mo ago

•

Applied to World, mind, and learnability: A note on the metaphysical structure of the cosmos [& LLMs] by Bill Benzon 7mo ago

•

Applied to Google DeepMind's RT-2 by SandXbox 8mo ago

•

Applied to The positional embedding matrix and previous-token heads: how do they actually work? by Adam Yedidia 8mo ago

•

Applied to How LLMs are and are not myopic by janus 9mo ago

•

Applied to GPT-2's positional embedding matrix is a helix by Adam Yedidia 9mo ago

•

Applied to Killing Recurrent Memory Over Self Attention? by Del Nobolo 10mo ago