























Paul thinks common stories about what AI catastrophe might look like are flawed. Here, he paints a picture of two failure scenarios that seem more likely, summarized as: "ML helps us get what we measure (creating a slow rolling catastrophe)", and "ML gives rise to greedy patterns that try to expand their influence."
A story in nine parts about someone creating an AI that predicts the future, and multiple people who wonder about the implications. What happens when the predictions influence what future happens?
In thinking about AGI safety, I’ve found it useful to build a collection of different viewpoints from people that I respect, such that I can think from their perspective. I will often try to compare what an idea feels like when I put on my Paul Christiano hat, to when I put on my Scott Garrabrant hat. Recently, I feel like I’ve gained a "Chris Olah" hat, which often looks at AI through the lens of interpretability.
The goal of this post is to try to give that hat to more people.
Eric Drexler's CAIS model suggests that before we get to a world with monolithic AGI agents, we will already have seen an intelligence explosion due to automated R&D. This reframes the problems of AI safety and has implications for what technical safety researchers should be doing. Rohin reviews and summarizes the model
AI safety researchers have different ideas of what success would look like. This post explores five different AI safety "success stories" that researchers might be aiming for and compares them along several dimensions.