Measuring Structure Development in Algorithmic Transformers
tl;dr: We compute the evolution of the local learning coefficient (LLC), a proxy for model complexity, for an algorithmic transformer. The LLC decreases as the model learns more structured solutions, such as head specialization. This post is structured in three main parts, (1) a summary, giving an overview of the...
Aug 22, 202456