Summary Bounded agents might be unaware of possibilities relevant to their decision-making. That is, they may not just be uncertain, but fail to conceive of some relevant hypotheses entirely. What's more, commitment races might pressure early AGIs into adopting an updateless policy from a position of limited awareness. What happens...
In this post, we look at conditions under which Intent Alignment isn't Sufficient or Intent Alignment isn't Necessary for interventions on AGI systems to reduce the risks of (unendorsed) conflict to be effective. We then conclude this sequence by listing what we currently think are relatively promising directions for technical...
Here we will look at two of the claims introduced in the previous post: AGIs might not avoid conflict that is costly by their lights (Capabilities aren’t Sufficient) and conflict that is costly by our lights might not be costly by the AGIs’ (Conflict isn’t Costly). Explaining costly conflict First...
This is a pared-down version of a longer draft report. We went with a more concise version to get it out faster, so it ended up being more of an overview of definitions and concepts, and is thin on concrete examples and details. Hopefully subsequent work will help fill those...
Introduction We at the Center on Long-Term Risk (CLR) are focused on reducing risks of cosmically significant amounts of suffering[1], or s-risks, from transformative artificial intelligence (TAI). We currently believe that: * Several agential s-risks — in which agents deliberately cause great amounts of suffering — are among the largest...
I haven't seen much discussion of this, but it seems like an important factor in how well AI systems deployed by actors with different goals manage to avoid conflict (cf. my discussion of equilibrium and prior selection problems here). For instance, would systems be trained * Against copies of agents...
To avoid catastrophic conflict in multipolar AI scenarios, we would like to design AI systems such that AI-enabled actors will tend to cooperate. This post is about some problems facing this effort and some possible solutions. To explain these problems, I'll take the view that the agents deployed by AI...