The linked note is something I "noticed" while going through different versions of this result in the literature. I think that this sort of mathematical work on neural networks is worthwhile and worth doing to a high standard but I have no reason to think that this particular work is...
These remarks are basically me just wanting to get my thoughts down after a Twitter exchange on this subject. I've not spent much time on this post and it's certainly plausible that I've gotten things wrong. In the 'Key Takeaways' section of the Modular Addition part of the well-known post...
In Thought Experiments Provide a Third Anchor, Jacob Steinhardt wrote about the relative merits of a few different reference classes when it comes to reasoning and making predictions about future machine learning systems. He refers to these reference classes as ‘anchors’ and writes: > There are many other anchors that...
> From a mathematical point of view, the building and training of a large transformer > language model (LLM) is the construction of a certain function, from some euclidean space to another, that has certain interesting properties. And it may therefore be surprising to find that many key papers announcing...
If the trajectory of the deep learning paradigm continues, it seems plausible to me that in order for applications of low-level interpretability to AI not-kill-everyone-ism to be truly reliable, we will need a much better-developed and more general theoretical and mathematical framework for deep learning than currently exists. And this...
Anthropic's recent mechanistic interpretability paper, Toy Models of Superposition, helps to demonstrate the conceptual richness of very small feedforward neural networks. Even when being trained on synthetic, hand-coded data to reconstruct a very straightforward function (the identity map), there appears to be non-trivial mathematics at play and the analysis of...
In this note I will discuss some computations and observations that I have seen in other posts about "basin broadness/flatness". I am mostly working off the content of the posts Information Loss --> Basin flatness and Basin broadness depends on the size and number of orthogonal features. I will attempt...