I wonder, almost just idle curiosity, whether or not the "measuring-via-proxy will cause value drift" is something we could formalize and iterate on first. Is the problem stable on the meta-level, or is there a way we can meaningfully define "not drifting from the proxy" without just generally solving alignment.
Intuitively I'd guess this is the "don't try to be cute" class of thought, but I was afraid to post at all and decided that I wanted to interact, even at the cost of (probably) saying something embarassing.
I wonder, almost just idle curiosity, whether or not the "measuring-via-proxy will cause value drift" is something we could formalize and iterate on first. Is the problem stable on the meta-level, or is there a way we can meaningfully define "not drifting from the proxy" without just generally solving alignment.
Intuitively I'd guess this is the "don't try to be cute" class of thought, but I was afraid to post at all and decided that I wanted to interact, even at the cost of (probably) saying something embarassing.