Predictions & Self-awareness

Some animals, if put in front of a mirror, will notice that there is some kind of moving animal-ish thing in front of them. They are "aware of themselves", but they are not necessarily "self-aware" in the sense we normally use the term. The animals that pass the mirror test are the ones that realize the moving animal-ish thing is them.

Suppose we create a powerful AI system that uses (un)supervised learning techniques to understand and make predictions about the world. If the dataset the AI system is trained on includes data about itself, the AI system will be "aware of itself" in the sense of seeing an animal-ish thing in the mirror. Is there a risk that it could graduate to "self-awareness" in the sense of realizing the thing in its training data is it?

I contend this risk is low. When an animal passes the mirror test, it is noticing an isomorphism between its inborn sense of self (endowed by evolution for self-preservation) and the thing in the mirror. But if we don't endow our AI system with an inborn sense of self, there is no isomorphism to notice.

That doesn't mean purely predictive AI systems are completely safe.

Photo: Christian Holmér

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Predictions & Self-awareness