You are viewing version 1.7.0 of this page. Click here to view the latest version.

AI Boxing (Containment)

Edited by Multicore, Ruby, et al. last updated 12th Sep 2020

You are viewing revision 1.7.0, last edited by Multicore

AI Boxing is attempts, experiments, or proposals to isolate ("box") an unaligned AI where it can't interact with the world at large and cause harm. See also: AI

Challenges are: 1) can you successively prevent it from interacting with the world? And 2) can you prevent it from convincing you to let it out?

One idea for AI boxing is the Oracle AI: an AI that only answers questions and isn't designed to interact with the world in any other way. But even the act of the AI putting strings of text in front of humans poses some risk.

The AI Box Experiment is a game meant to explore the possible pitfalls of AI boxing. It is played over text chat, with one human roleplaying as an AI in a box, and another human roleplaying as a gatekeeper with the ability to let the AI out of the box. The AI player wins if they successfully convince the gatekeeper to let them out of the box, and the gatekeeper wins if the AI player has not been freed after a certain period of time. The AI Box Experiment has been played several times, but the text logs are generally not made public. It was first invented by Eliezer Yudkowsky, who won his first two games playing as the AI.

Posts tagged AI Boxing (Containment)

18Cryptographic Boxes for Unfriendly AI

paulfchristiano

15y

0

41The case for training frontier AIs on Sumerian-only corpus

Alexandre Variengien, Charbel-Raphaël, Jonathan Claybrough

2y

0

39Thoughts on “Process-Based Supervision”

Steven Byrnes

2y