AI Boxing (Containment)

Oliver Habryka	v1.10.0Sep 12th 2020	(+5/-4) Improve "see also" formatting
Ruben Bloom	v1.9.0Sep 12th 2020	(+239/-259)
Ruben Bloom	v1.8.0Sep 12th 2020	(+2996/-223)
Multicore	v1.7.0Aug 30th 2020
Multicore	v1.6.0Aug 12th 2020
Multicore	v1.5.0Aug 12th 2020	(+629)
Multicore	v1.4.0Aug 12th 2020	(+225)
Ruben Bloom	v1.3.0May 4th 2020	(+9/-18)
Oliver Habryka	v1.2.0Apr 23rd 2020	(+21)
Ruben Bloom	v1.1.0Apr 16th 2020	(+222/-2984)

Load More (10/25)

Oliver Habryka v1.10.0Sep 12th 2020 (+5/-4) Improve "see also" formatting

Escaping the box

It is not regarded as likely that an AGI can be boxed in the long term. Since the AGI might be a superintelligence, it could persuade someone (the human liaison, most likely) to free it from its box and thus, human control. Some practical ways of achieving this goal include:

Offering enormous wealth, power and intelligence to its liberator
Claiming that only it can prevent an existential risk
Claiming it needs outside resources to cure all diseases
Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out

Other, more speculative ways include: threatening to torture millions of conscious copies of you for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that you are a simulation, or it might discover and exploit unknown physics to free itself.

Containing the AGI

Attempts to box an AGI may add some degree of safety to the development of a friendly artificial intelligence (FAI). A number of strategies for keeping an AGI in its box are discussed in Thinking inside the box and Leakproofing the Singularity. Among them are:

Physically isolating the AGI and permitting it zero control of any machinery
Limiting the AGI’s outputs and inputs with regards to humans
Programming the AGI with deliberately convoluted logic or homomorphically encrypting portions of it
Periodic resets of the AGI's memory
A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed
Motivational control using a variety of techniques

Simulations / Experiments

Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a superintelligence, and been able to convince a human playing a guard to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.

The text of Eliezer's experiments have not been made public.

List of experiments

The AI-Box Experiment Eliezer Yudkowsky's original two tests
Shut up and do the impossible!, three other experiments Eliezer ran
AI Boxing, 26 trials ran by Justin Corwin
AI Box Log, a log of a trial between MileyCyrus and Dorikka

References

Thinking inside the box: using and controlling an Oracle AI by Stuart Armstrong, Anders Sandberg, and Nick Bostrom
Leakproofing the Singularity: Artificial Intelligence Confinement Problem by Roman V. Yampolskiy
On the Difficulty of AI Boxing by Paul Christiano
Cryptographic Boxes for Unfriendly AI by Paul Christiano
The Strangest Thing An AI Could Tell You
The AI ~~Box Experiment has been played several times, but the text logs are generally not made public. It was first invented by Eliezer Yudkowsky, who won his first two games playing as the AI.~~
in a box boxes you

Discuss this tag (0)

Multicore v1.7.0Aug 30th 2020

Discuss this tag (0)

Multicore v1.6.0Aug 12th 2020

Discuss this tag (0)

Multicore v1.5.0Aug 12th 2020 (+629)

The AI Box Experiment is a game meant to explore the possible pitfalls of AI boxing. It is played over text chat, with one human roleplaying as an AI in a box, and another human roleplaying as a gatekeeper with the ability to let the AI out of the box. The AI player wins if they successfully convince the gatekeeper to let them out of the box, and the gatekeeper wins if the AI player has not been freed after a certain period of time. The AI Box Experiment has been played several times, but the text logs are generally not made public. It was first invented by Eliezer Yudkowsky, who won his first two games playing as the AI.

Discuss this tag (0)

Multicore v1.4.0Aug 12th 2020 (+225)

One idea for AI boxing is the Oracle AI: an AI that only answers questions and isn't designed to interact with the world in any other way. But even the act of the AI putting strings of text in front of humans poses some risk.

Discuss this tag (0)

Ruben Bloom v1.3.0May 4th 2020 (+9/-18)

AI Boxing is attempts, experiments, or proposals to isolate ("box") an unaligned AI where it can't interact with the world at large and cause harm. ~~Part of~~See also: AI ~~Alignment~~.

Discuss this tag (0)

Oliver Habryka v1.2.0Apr 23rd 2020 (+21)

AI Boxing is attempts, experiments, or proposals to isolate ("box") an unaligned AI where it can't interact with the world at large and cause harm. Part of AI Alignment.

Discuss this tag (0)

Ruben Bloom v1.1.0Apr 16th 2020 (+222/-2984)

An AI ~~Box~~Boxing is ~~a confined computer system in which~~attempts, experiments, or proposals to isolate ("box") an ~~Artificial General Intelligence~~ ~~(AGI) resides, unable to~~unaligned AI where it can't interact with the ~~external~~ world ~~in any way, save for limited communication with its human liaison. It is often proposed that so long as an AGI is physically isolated~~at large and ~~restricted, or "boxed", it will be harmless even if it is an~~ ~~unfriendly artificial intelligence~~ ~~(UAI).~~cause harm.

Escaping the box

~~It is not regarded as likely that an AGI~~Challenges are: 1) can ~~be boxed in the long term. Since the AGI might be a~~ ~~superintelligence, it could persuade someone (the human liaison, most likely) to free~~you successively prevent it from ~~its box and thus, human control. Some practical ways of achieving this goal include:~~

~~Offering enormous wealth, power and intelligence to its liberator~~
~~Claiming that only~~interacting with the world? And 2) can you prevent it ~~can prevent an~~ ~~existential risk~~
~~Claiming it needs outside resources to cure all diseases~~
~~Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out~~

~~Other, more speculative ways include: threatening to torture millions of conscious copies of~~from convincing you ~~for thousands of years, starting in exactly the same situation as in such a way that it seems overwhelmingly likely that~~ ~~you are a simulation, or it might discover and exploit unknown physics to free itself.~~

Containing the AGI

~~Attempts to box an AGI may add some degree of safety to the development of a~~ ~~friendly artificial intelligence~~ ~~(FAI). A number of strategies for keeping an AGI in its box are discussed in~~ ~~Thinking inside the box~~ ~~and~~ ~~Leakproofing the Singularity. Among them are:~~

~~Physically isolating the AGI and permitting it zero control of any machinery~~
~~Limiting the AGI’s outputs and inputs with regards to humans~~
~~Programming the AGI with deliberately convoluted logic or~~ ~~homomorphically encrypting~~ ~~portions of it~~
~~Periodic resets of the AGI's memory~~
~~A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed~~
~~Motivational control using a variety of techniques~~

Simulations

~~Both Eliezer Yudkowsky and Justin Corwin have ran simulations, pretending to be a~~ ~~superintelligence, and been able to convince a human playing a guard~~ to let them out on many - but not all - occasions. Eliezer's five experiments required the guard to listen for at least two hours with participants who had approached him, while Corwin's 26 experiments had no time limit and subjects he approached.it out?

List of experiments

~~The AI-Box ExperimentEliezer Yudkowsky's~~ ~~original two tests~~
~~Shut up and do the impossible!, three other experiments Eliezer ran~~
~~AI Boxing, 26 trials ran by Justin Corwin~~
~~AI Box Log, a log of a trial between MileyCyrus and Dorikka~~

References

~~Thinking inside the box: using and controlling an Oracle AI~~ ~~by Stuart Armstrong, Anders Sandberg, and Nick Bostrom~~
~~Leakproofing the Singularity: Artificial Intelligence Confinement Problem~~ ~~by Roman V. Yampolskiy~~
~~On the Difficulty of AI Boxing~~ ~~by Paul Christiano~~
~~Cryptographic Boxes for Unfriendly AI~~ ~~by Paul Christiano~~
~~The Strangest Thing An AI Could Tell You~~
~~The AI in a box boxes you~~

Discuss this tag (0)

Load More (10/25)

AI ALIGNMENT FORUM
AF

See also

See also

Escaping the box

Containing the AGI

Simulations / Experiments

References

Escaping the box

Containing the AGI

Simulations

List of experiments

See Also

References