AI ALIGNMENT FORUM
AF

Wei Shi
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Wei Shi10mo00

I got it, thank you very much!

Reply
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Wei Shi10mo00

We trained a crosscoder of width 16,384 on the residual stream activations from the middle layer of the Gemma-2 2B base and IT models.

I don't understand the training process here, as well as the mini-paper from Anthropic. How do you train one crosscoder on the residual stream from two different models?

Reply
No posts to display.