This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Interpretability (ML & AI)
•
Applied to
Announcing Apollo Research
by
Marius Hobbhahn
5h
ago
•
Applied to
Aligning an H-JEPA agent via training on the outputs of an LLM-based "exemplary actor"
by
Roman Leventov
10h
ago
•
Applied to
The king token
by
p.b.
2d
ago
•
Applied to
Why and When Interpretability Work is Dangerous
by
Nicholas Kross
3d
ago
•
Applied to
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
by
Stefan Heimersheim
5d
ago
•
Applied to
[Linkpost] Interpretability Dreams
by
DanielFilan
6d
ago
•
Applied to
'Fundamental' vs 'applied' mechanistic interpretability research
by
Lee Sharkey
7d
ago
•
Applied to
Activation additions in a small residual network
by
Raymond Arnold
8d
ago
•
Applied to
Gender Vectors in ROME’s Latent Space
by
Xodarap
9d
ago
•
Applied to
A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)
by
Joseph Isaac Bloom
14d
ago
•
Applied to
My current workflow to study the internal mechanisms of LLM
by
Yulu Pi
14d
ago
•
Applied to
Input Swap Graphs: Discovering the role of neural network components at scale
by
Alexandre Variengien
18d
ago
•
Applied to
AI interpretability could be harmful?
by
Roman Leventov
20d
ago
•
Applied to
New OpenAI Paper - Language models can explain neurons in language models
by
Raymond Arnold
20d
ago
•
Applied to
AGI-Automated Interpretability is Suicide
by
__RicG__
20d
ago
•
Applied to
Have you heard about MIT's "liquid neural networks"? What do you think about them?
by
Ppau
21d
ago
•
Applied to
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
by
Stefan Heimersheim
21d
ago