AI ALIGNMENT FORUM
AF

tardygrade
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Prizes for ELK proposals
tardygrade4y10

Early in the ELK report, it mentions that ARC doesn't believe that strategies like debate solves ELK in the worst case. Can I get some clarifications on why? Specifically, a debate inspired set-up for SafeVault could be something like:

We train the reporter to take a human belief as input (i.e. "The diamond is in the vault.") and returns a "truthful" argument that is most likely to change the human's belief. 

We can guarantee "truthfulness" by for example restricting the output to be a video rendering of what happens in the vault from some camera angle.

Reply
No posts to display.