Roko's Basilisk - AI Alignment Forum

Roko’s basilisk is a thought experiment proposed in 2010 by the user Roko on the Less Wrong community blog. Roko used ideas in decision theory to argue that a sufficiently powerful AI agent would have an incentive to torture anyone who imagined the agent but didn't work to bring the agent into existence. The argument was called a "basilisk" --named after the legendary reptile who can cause death with a single glance--because merely hearing the argument would supposedly put you at risk of torture from this hypothetical agent. A basilisk in this context is any information that harms or endangers the people who hear it.

Roko's argument was broadly rejected on Less Wrong, with commenters objecting that an agent like the one Roko was describing would have no real reason to follow through on its threat: once the agent already exists, it will by default just see it as a waste of resources to torture people for their past decisions, since this doesn't causally further its plans. A number of decision algorithms can follow through on acausal threats and promises, via the same methods that permit mutual cooperation in prisoner's dilemmas; but this doesn't imply that such theories can be blackmailed. And following through on blackmail threats against such an algorithm additionally requires a large amount of shared information and trust between the agents, which does not appear to exist in the case of Roko's basilisk.

Less Wrong's founder, Eliezer Yudkowsky, banned discussion of Roko's basilisk on the blog for several years as part of a general site policy against spreading potential information hazards. This had the opposite of its intended effect: a number of outside websites began sharing information about Roko's basilisk, as the ban attracted attention to this taboo topic. Websites like RationalWiki spread the assumption that Roko's basilisk had been banned because Less Wrong users accepted the argument; thus many criticisms of Less Wrong cite Roko's basilisk as evidence that the site's users have unconventional and wrong-headed beliefs....

(Read More)