Polysemantic Attention Head in a 4-Layer Transformer
Produced as a part of MATS Program, under @Neel Nanda and @Lee Sharkey mentorship Epistemic status: optimized to get the post out quickly, but we are confident in the main claims TL;DR: head 1.4 in attn-only-4l exhibits many different attention patterns that are all relevant to model's performance Introduction *...