x

AI ALIGNMENT FORUM

AF

Seonglae Cho — AI Alignment Forum

Seonglae Cho

Seonglae Cho

Message

based in London, England

https://github.com/seonglae

https://www.linkedin.com/in/seonglae

4

1

5

2y

Seonglae Cho

based in London, England

https://github.com/seonglae

https://www.linkedin.com/in/seonglae

SAE Training Dataset Influence in Feature Matching and a Hypothesis on Position Features

Abstract Sparse Autoencoders (SAEs) linearly extract interpretable features from a large language model's intermediate representations. However, the basic dynamics of SAEs, such as the activation values of SAE features and the encoder and decoder weights, have not been as extensively visualized as their implications. To shed light on the properties...

Feb 26, 2025•4