Top postsTop post
Abstract Mathematical models can describe neural network architectures and training environments, however the learned representations that emerge have remained difficult to model. Here we build a new theoretical model of internal representations. We do this via an economic and information theory framing. We distinguish niches of value that representations can...
While I think about interpretability a lot, it's not my day job! Let me dive down a rabbit hole and tell me where I am wrong. Intro As I see it the first step to interpretability would be isolating which neurons perform which role. For example, which neurons are representing...