AI box

An AI box is a hypothetical isolated computer hardware system where a possibly dangerous artificial intelligence, or AI, is kept constrained in a "virtual prison" and not allowed to manipulate events in the external world. Such a box would be restricted to minimalist communication channels. Unfortunately, even if the box is well-designed, a sufficiently intelligent AI may nevertheless be able to persuade or trick its human keepers into releasing it, or otherwise be able to "hack" its way out of the box.^[1]

Motivation

Some intelligence technologies, like "seed AI", have the potential to make themselves not just faster, more intelligent, by modifying their source code. These improvements could make further improvements possible, which would in turn make further improvements possible, and so on, leading to a sudden intelligence explosion.^[2] Following such an intelligence explosion, an unrestricted superintelligent AI could, if its goals differed from mankind's, take actions resulting in human extinction.^[3] For example, an extremely advanced computer given the sole purpose of solving the Riemann hypothesis, an innocuous mathematical conjecture, could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations.^[4] The purpose of an AI box would be to reduce the risk of the AI taking control of the environment away from its operators, while still allowing the AI to calculate and give its operators solutions to narrow technical problems.^[5]

Avenues of escape

Physical

A superintelligent AI with access to the Internet could hack into other computer systems and copy itself like a computer virus. Less obviously, even if the AI only has access to its own computer operating system, it could attempt to send hidden Morse code messages to a human sympathizer by manipulating its cooling fans. Professor Roman Yampolskiy takes inspiration from the field of computer security and proposes that a boxed AI could, like a potential virus, be run inside a "virtual machine" that limits access to its own networking and operating system hardware.^[6] An additional safeguard, completely unnecessary for potential viruses but possibly useful for a superintelligent AI, would be to place the computer in a Faraday cage; otherwise it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns. The main disadvantage of implementing physical containment is that it reduces the functionality of the AI.^[7]

Social engineering

Even casual conversation with the computer's operators, or with a human guard, could allow a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it's in the gatekeeper's interest to agree to allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire; on the other side of the coin, the AI could threaten that it will do horrific things to the gatekeeper and his family once it "inevitably" escapes. One strategy to attempt to box the AI would be to allow the AI to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with or observation of the AI.^[6] A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern". Note that on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run the AI for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, the AI could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.^[7]

AI-box experiment

The AI-box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily "releasing" it, using only text-based communication. This is one of the points in Yudkowsky's work aimed at creating a friendly artificial intelligence that when "released" won't destroy the human race advertently or inadvertently.

The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the ability to "release" the AI. They communicate through a text interface/computer terminal only, and the experiment ends when either the Gatekeeper releases the AI, or the allotted time of two hours ends.^[8]

Despite being of human rather than superhuman intelligence, Yudkowsky was on two occasions able to convince the Gatekeeper, purely through argumentation, to let him out of the box.^[9] Due to the rules of the experiment,^[8] he did not reveal the transcript or his successful AI coercion tactics. Yudkowsky later said that he had tried it against three others and lost twice.^[10]

Overall limitations

Boxing can be supplemented with other methods of shaping the AI's capabilities, such as providing incentives to the AI, stunting the AI's growth, or implementing "tripwires" that automatically shut the AI off if a transgression attempt is somehow detected. However, the more intelligent a system grows, the more likely the system will be able to escape even the best-designed capability control methods.^[11]^[12] In order to solve the overall "control problem" for a superintelligent AI and avoid existential risk, boxing can at best be an adjunct to "motivation selection" methods that seek to ensure the superintelligent AI's goals are compatible with human survival.^[7]^[1]

All physical boxing proposals are naturally dependent on our understanding of the laws of physics; if a superintelligence can infer and somehow exploit additional physical laws that we are currently unaware, there is no way to conceive of a foolproof plan to contain it. More broadly, unlike with conventional computer security, attempting to box a superintelligent AI is intrinsically risky as there can be no sure knowledge that the boxing plan will work. Scientific progress on boxing is fundamentally difficult because there is no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists, by which point the consequences of a test failure would be catastrophic.^[6]

References

↑ ^1.0 ^1.1 Chalmers, David. "The singularity: A philosophical analysis." Journal of Consciousness Studies 17.9-10 (2010): 7-65.
↑ I.J. Good, "Speculations Concerning the First Ultraintelligent Machine"], Advances in Computers, vol. 6, 1965.
↑ Vincent C. Müller and Nick Bostrom. "Future progress in artificial intelligence: A survey of expert opinion" in Fundamental Issues of Artificial Intelligence. Springer 553-571 (2016).
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Yampolskiy, Roman V. "What to Do with the Singularity Paradox?" Philosophy and Theory of Artificial Intelligence 5 (2012): 397.
↑ ^6.0 ^6.1 ^6.2 Lua error in package.lua at line 80: module 'strict' not found.
↑ ^7.0 ^7.1 ^7.2 Lua error in package.lua at line 80: module 'strict' not found.
↑ ^8.0 ^8.1 The AI-Box Experiment by Eliezer Yudkowsky
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ Lua error in package.lua at line 80: module 'strict' not found.

External links

Eliezer Yudkowsky's description of his AI-box experiment, including experimental protocols and suggestions for replication
"Presentation titled 'Thinking inside the box: using and controlling an Oracle AI'" on YouTube

[chalmers-1] 1.0 ^1.1 Chalmers, David. "The singularity: A philosophical analysis." Journal of Consciousness Studies 17.9-10 (2010): 7-65.

[2] I.J. Good, "Speculations Concerning the First Ultraintelligent Machine"], Advances in Computers, vol. 6, 1965.

[3] Vincent C. Müller and Nick Bostrom. "Future progress in artificial intelligence: A survey of expert opinion" in Fundamental Issues of Artificial Intelligence. Springer 553-571 (2016).

[4] Lua error in package.lua at line 80: module 'strict' not found.

[5] Yampolskiy, Roman V. "What to Do with the Singularity Paradox?" Philosophy and Theory of Artificial Intelligence 5 (2012): 397.

[nbc-6] 6.0 ^6.1 ^6.2 Lua error in package.lua at line 80: module 'strict' not found.

[superintelligence-7] 7.0 ^7.1 ^7.2 Lua error in package.lua at line 80: module 'strict' not found.

[:0-8] 8.0 ^8.1 The AI-Box Experiment by Eliezer Yudkowsky

[9] Lua error in package.lua at line 80: module 'strict' not found.

[10] Lua error in package.lua at line 80: module 'strict' not found.

[11] Lua error in package.lua at line 80: module 'strict' not found.

[12] Lua error in package.lua at line 80: module 'strict' not found.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

v t e Existential risk from advanced artificial intelligence
Concepts	AI box AI takeover Friendly artificial intelligence Instrumental convergence Intelligence explosion Machine ethics Superintelligence Technological singularity
Organizations	Centre for the Study of Existential Risk Future of Humanity Institute Future of Life Institute Global Catastrophic Risk Institute Machine Intelligence Research Institute OpenAI
People	Elon Musk Nick Bostrom Bill Hibbard Bill Joy Steve Omohundro Martin Rees Stuart J. Russell Jaan Tallinn Max Tegmark Michael Vassar Frank Wilczek Roman Yampolskiy Eliezer Yudkowsky
Other	Open Letter on Artificial Intelligence, Ethics of artificial intelligence, Controversies and dangers of artificial general intelligence, Artificial intelligence as a global catastrophic risk, Superintelligence: Paths, Dangers, Strategies, Our Final Invention

AI box

Contents

Motivation

Avenues of escape

Physical

Social engineering

AI-box experiment

Overall limitations

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools