All of Nora Belrose's Comments + Replies

I think it would be a distraction to try to figure out if LMs are "phenomenally conscious" for a few different reasons.

  1. I think there are pretty strong reasons to believe that phenomenal consciousness is not actually a substantive property, in the sense that either everything has it in some sense (panpsychism) or nothing does (eliminativism). Any other solution confronts the Hard Problem and the empirical intractability of actually figuring out which things are or are not phenomenally conscious.
  2. Your proposed tests for phenomenal consciousness seem to, in fa
... (read more)

This probably doesn't work, but have you thought about just using weight decay as a (partial) solution to this? In any sort of architecture with residual connections you should expect circuits to manifest as weights with nontrivial magnitude. If some set of weights isn't contributing to the loss then the gradients won't prevent them from being pushed toward zero by weight decay. Sort of a "use it or lose it" type thing. This seems a lot simpler and potentially more robust than other approaches.