Moral Divergence

Further Reading & References

Chalmers, David John. (1996). The conscious mind: In search of a fundamental theory. Philosophy of Mind Series

Haidt, Jonathan. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral

judgment. Psychological Review, 108, (4), 814–834

Wallach, Wendell, Colin Allen, and Iva Smit. (2008). Machine morality: Bottom-up and top-down approaches for modelling human moral faculties. In Ethics and artificial agents. Special issue, AI & Society, 22, (4)

Yudkowsky, Eliezer. (2008). Artificial intelligence as a positive and negative factor in global risk. In Global catastrophic risks, ed. Nick Bostrom and Milan M. Ćirković, 308–345

Moral divergence is the implicit characterization of the state of current consequentialist and utilitarian moral theories made by Carl Shulman and colleagues in their paper Which Consequentialism? Machine Ethics and Moral Divergence. Several authors have suggested these kind of theories as the organizing principles for the development of machine ethics. In this paper, however, it is underlined the variation among them, the obstacle this divergence poses and how to start solving it.

Consequentialism is currently seen as having a great number of "free variables" - that is, there are a number of different dimensions along which it varies. Hedonistic utilitarianism, for example, requires a description of experiental states, from pleasure to pain, but utilitarianists disagree in how much to value each state. At the same time, preference utilitarianists discuss the definition of even simple and crucial terms like preference and dispute the value of actual satisfaction of preferences vs just the experience of satisfaction. More broadly, all utilitarianists embrace the need of a inter-personal utility function comparison\aggregation, which in itself brings more free variables such as how to sum or average utilities. It is then clear that the picture emerging from the field brings a lot of confusion and no clear solution to what kind of utility function to implement in AMAs (Artificial Moral Agents).

The inadequation of current moral theories to the field of machine ethics is further explored through an analysis of the behavior produced by such different consequential beliefs: it appears rather similar. This could imply that AMAs with diverse consequentialist views would converge in their behaviors. This view, however is seen as mistaken, as most of current neuroscientific and moral psychology knowledge shows that moral decisions emerge from unconscious intuitions and that its explicitation only comes after. It would follow that developing AMAs with explicit moral systems could lead to AIs oblivious to any other informal intuitions that the designers failed to specify.

The authors conclude by rejecting the views that favor these kind of top-down, explicit implementations of moral systems, especially due to the difficulty of reaching a consensus purely through philosophical debate. At the same time, they suggest precaution when thinking about implement bottom-up approaches due to the possibility of the AI learning the wrong values through improper generalization. Instead, an external aid to these theories is proposed, through the use of the knowledge from neuroscientific, experimental and moral psychology fields regarding moral decision and behavior.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Further Reading & References

See also

Moral Divergence

Further Reading & References

See also