Summary To quickly transform the world, it's not enough for AI to become super smart (the "intelligence explosion"). AI will also have to turbocharge the physical world (the "industrial explosion"). Think robot factories building more and better robot factories, which build more and better robot factories, and so on. The...
We’ve written a new report on the threat of AI-enabled coups. I think this is a very serious risk – comparable in importance to AI takeover but much more neglected. In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat...
Abstract Once AI systems can design and build even more capable AI systems, we could see an intelligence explosion, where AI capabilities rapidly increase to well past human performance. The classic intelligence explosion scenario involves a feedback loop where AI improves AI software. But AI could also improve other inputs...
The usual basic framing of alignment looks something like this: We have a system “A” which we are trying to align with system "H", which should establish some alignment relation “f” between the systems. Generally, as the result, the aligned system A should do "what the system H wants". Two...
Paper covering some of the same ideas is now available at https://arxiv.org/abs/2311.10215 Prelude: when GPT first hears its own voice Imagine humans in Plato’s cave, interacting with reality by watching the shadows on the wall. Now imagine a second cave, further away from the real world. GPT trained on text...
Prelude: sharks, aliens, and AI If you go back far enough, the ancestors of sharks and dolphins look really different: An acanthodian, ancestor to modern sharks[1]A pakicetus, ancestor to modern dolphins[2] But modern day sharks and dolphins have very similar body shapes: Bodies of a shark, ichthyosaurus and dolphin. Generated...
When we're trying to do AI alignment, we're often studying systems which don't yet exist. This is a pretty weird epistemic activity, and seems really hard to get right. This post offers one frame for thinking about what we're actually doing when we're thinking about AI alignment: using parts of...