tardygrade — AI Alignment Forum

Early in the ELK report, it mentions that ARC doesn't believe that strategies like debate solves ELK in the worst case. Can I get some clarifications on why? Specifically, a debate inspired set-up for SafeVault could be something like:

We train the reporter to take a human belief as input (i.e. "The diamond is in the vault.") and returns a "truthful" argument that is most likely to change the human's belief.

We can guarantee "truthfulness" by for example restricting the output to be a video rendering of what happens in the vault from some camera angle.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments