AI ALIGNMENT FORUM
AF

1739
curtisrussell
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Worlds Where Iterative Design Fails
curtisrussell3y30

I wonder, almost just idle curiosity, whether or not the "measuring-via-proxy will cause value drift" is something we could formalize and iterate on first. Is the problem stable on the meta-level, or is there a way we can meaningfully define "not drifting from the proxy" without just generally solving alignment.

Intuitively I'd guess this is the "don't try to be cute" class of thought, but I was afraid to post at all and decided that I wanted to interact, even at the cost of (probably) saying something embarassing.

Reply