These are some research notes on whether we could reduce AI takeover risk by cooperating with unaligned AIs. I think the best and most readable public writing on this topic is “Making deals with early schemers”, so if you haven't read that post, I recommend starting there. These notes were...
In the future, we might accidentally create AIs with ambitious goals that are misaligned with ours. But just because we don’t have the same goals doesn’t mean we need to be in conflict. We could also cooperate with each other and pursue mutually beneficial deals. For previous discussion of this,...
We’ve written a new report on the threat of AI-enabled coups. I think this is a very serious risk – comparable in importance to AI takeover but much more neglected. In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat...
I've written several posts about the plausible implications of "evidential cooperation in large worlds" (ECL), on my newly-revived blog. This is a cross-post of the first. If you want to see the rest of the posts, you can either go to the blog or click through the links in this...
Two and a half years ago, I wrote Extrapolating GPT-N performance, trying to predict how fast scaled-up models would improve on a few benchmarks. One year ago, I added PaLM to the graphs. Another spring has come and gone, and there are new models to add to the graphs: PaLM-2...
As AI systems get more capable, they may at some point be able to help us with alignment research. This increases the chance that things turn out ok.[1] Right now, we don’t have any particularly scalable or competitive alignment solutions. But the methods we do have might let us use...