Adam Shai

Message

Neuroscientist turned Interpretability Researcher. Starting Simplex, an AI Safety Research Org.

1453

206

121

Adam Shai

Neuroscientist turned Interpretability Researcher. Starting Simplex, an AI Safety Research Org.

Adam Shai — AI Alignment Forum

Adam Shai

Message

Neuroscientist turned Interpretability Researcher. Starting Simplex, an AI Safety Research Org.

1453

206

121

Adam Shai

Neuroscientist turned Interpretability Researcher. Starting Simplex, an AI Safety Research Org.

Simplex Progress Report - July 2025

Thanks to Jasmina Urdshals, Xavier Poncini, and Justis Mills for comments. Introduction At Simplex our mission is to develop a principled science of the representations and emergent behaviors of AI systems. Our initial work showed that transformers linearly represent belief state geometries in their residual streams. We think of that...

Jul 28, 2025•113

Transformers Represent Belief State Geometry in their Residual Stream

Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work....

Apr 16, 2024•439

Beyond Kolmogorov and Shannon

by Alexander Gietelink Oldenziel and Adam Shai

This post is the first in a sequence that will describe James Crutchfield's Computational Mechanics framework. We feel this is one of the most theoretically sound and promising approaches towards understanding Transformers in particular and interpretability more generally. As a heads up: Crutchfield's framework will take many posts to fully...

Oct 25, 2022•63