You are viewing revision 1.45.0, last edited by Edwga

Friendly Artificial Intelligence (FAI) has two meanings. Its general meaning refers to any artificial general intelligence that has a positive rather than negative effect on humanity. The more specific meaning refers to the kinds of AGI designs which Eliezer Yudkowsky argues to be the only ones that can be expected to have a positive effect. The rest of this article uses the term in its more general sense.

In this context, Friendly AI also refers to the field of knowledge required to build such an AI. Note that Friendly (capital-F) is being used as a term of art, referring specifically to AIs that protect humans and humane values; an FAI need not be "friendly" in the conventional sense and need not even be sentient. Any AGI that is not friendly is said to be Unfriendly.

AI that underwent an intelligence explosion could exert unprecedented optimization power over its future; therefore, a Friendly AI could very well create an unimaginably good future. However, just because an AI has the means to do something, doesn't mean it will. An Unfriendly AI could represent an existential risk: destroying all humans, not out of hostility, but as a side effect of trying to do something entirely different.

Requiring Friendliness doesn't make AGI any easier, and almost certainly makes it harder. Most approaches to AGI aren't amenable to implementing precise goals, and so don't even constitute subprojects for FAI, leading to Unfriendly AI as the only possible "successful" outcome. Specifying Friendliness also presents unique technical challenges: humane moral value is very complex; a lot of seemingly simple-sounding moral concepts conceal hidden complexity not "inherent" in the universe itself. It is likely impossible to specify humane values by explicitly programming them in, one needs a technique for extracting them automatically.

Blog posts

See also

References