x

AI ALIGNMENT FORUM

AF

Transformer Circuits — AI Alignment Forum

Transformer Circuits

This page is a stub.

Add Posts

1

1

Posts tagged Transformer Circuits

1

19Finding Neurons in a Haystack: Case Studies with Sparse Probing

wesg, Neel Nanda

3y

1

2

53An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2

2y

11

1

41200 Concrete Open Problems in Mechanistic Interpretability: Introduction

4y

0

1

26Finding Sparse Linear Connections between Features in LLMs

Logan Riggs, Sam Mitchell, Adam Kaufman

3y

2

2

23How to Think About Activation Patching

3y

3

2

24Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Neel Nanda, Tom Lieberum, Matthew Rahtz, János Kramár, Geoffrey Irving, Rohin Shah, Vlad Mikulik

3y

0

2

16Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

3y

1

1

18200 COP in MI: Exploring Polysemanticity and Superposition

4y

1

1

17200 COP in MI: Interpreting Algorithmic Problems

4y

0

1

16A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

4y

15

1

11A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2

4y

0

1

9Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.

Ilia Shirokov, Ilya Nachevsky

1y

0

1

8200 COP in MI: Looking for Circuits in the Wild

4y

3

1

11200 COP in MI: Analysing Training Dynamics

4y

0

1

7200 COP in MI: Techniques, Tooling and Automation

4y

0

Load More (15/23)

Add Posts