antanaclasis — AI Alignment Forum

But if UDT starts with a broad prior, it will probably not learn, because it will have some weird stuff in its prior which causes it to obey random imperatives from imaginary Lizards.

I don’t think this necessarily follows? For there to be a systematic impact on UDT’s behavior there would need to be more Lizard-Worlds that reward X than Anti-Lizard-Worlds that penalize X, so this is only a concern if there is reason to believe that there are “more” worlds (in an abstract logical-probability sense) that favor a specific direction.

Clearly this could still potentially cause problems, but (at least to me) it doesn’t seem like the problem is as ubiquitous as the essay makes it out to be.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments