What’s in the box?! – Towards interpretability by distinguishing niches of value within neural networks.
Abstract Mathematical models can describe neural network architectures and training environments, however the learned representations that emerge have remained difficult to model. Here we build a new theoretical model of internal representations. We do this via an economic and information theory framing. We distinguish niches of value that representations can...
Feb 29, 20243