Deep sparse autoencoders yield interpretable features too
Summary * I sandwich the sparse layer in a sparse autoencoder (SAE) between non-sparse lower-dimensional layers and refer to this as a deep SAE. * I find that features from deep SAEs are at least as interpretable as features from standard shallow SAEs. * I claim that this is not...
Feb 23, 202531