sairjy — AI Alignment Forum

We could study such a learning process, but I am afraid that the lessons learned won't be so useful.

Even among human beings, there is huge variability in how much those emotions arise or if they do, in how much they affect behavior. Worst, humans tend to hack these feelings (incrementing or decrementing them) to achieve other goals: i.e MDMA to increase love/empathy or drugs for soldiers to make them soulless killers.

An AGI will have a much easier time hacking these pro-social-reward functions.

Humans provide an untapped wealth of evidence about alignment

sairjy3y0-2

sairjy3y0-1

Human beings and other animals have parental instincts (and in general empathy) because they were evolutionary advantageous for the population that developed them.

AGI won't be subjected to the same evolutionary pressures, so every alignment strategy relying on empathy or social reward functions, it is, in my opinion, hopelessly naive.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments