x

AI ALIGNMENT FORUM
AF

Calvin McCarter — AI Alignment Forum

Calvin McCarter

Calvin McCarter

Message

1

4y

Calvin McCarter

4y

Calvin McCarter has not written any posts yet.

Replying toELK prize results

Calvin McCarter4y

ELK prize results

Below is a summary of my (honorable mention) proposal, which tries to induce mind-blindness on the Reporter. I've also posted the full contest entry as a blog post here: https://calvinmccarter.wordpress.com/2022/02/19/mind-blindness-strategy-for-eliciting-latent-knowledge/ . I've not fully analyzed whether it survives previously mentioned counterexamples, or whether it has additional unique vulnerabilities compared to other strategies, so I'd welcome feedback on these questions.

The overall strategy is to avoid training a “human simulator” reporter by regularizing its internal state to have mind-blindness. One could imagine training a “Human Simulator” that takes as input the “what’s going on” state, plus a question about what a human believes about the world, and is trained to maximize its accuracy at... (read more)

0

0