Theory of mind is something that humans have instinctively and subconsciously, but that isn't easy to spell out explicitly; therefore, by Moravec's paradox, it will be very hard to implant it into an AI, and this needs to be done deliberately.

I think this is the weakest part. Consider: "Recognizing cat pictures is something humans can do instinctively and subconsciously, but that isn't easy to spell out explicitly; therefore, by Moravec's paradox, it will be very hard to implant it into an AI, and this needs to be done deliberately." But in practice, the techniques that work best for cat pictures work well for lots of other things as well, and a hardcoded solution customized for cat pictures will actually tend to underperform.

Reply

[-]Stuart_Armstrong6y20

I'm actually willing to believe that methods used for cat pictures might work for human theory of mind - if trained on that data (and this doesn't solve the underdefined problem).

Reply

[-]Michaël Trazzi6y40

Having printed and read the full version, this ultra-simplified version was an useful summary.

Happy to read a (not-so-)simplified version (like 20-30 paragraphs).

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

15

Ultra-simplified research agenda

15