Agency may be a convergent property of most AI systems (or at least, of many systems people are likely to try to build), once those systems reach a certain capability level. The simplest and most useful way to predict the behavior of such systems may therefore be to model them as agents.
Perhaps we can avoid the problems posed by agency by building only tool AI. In that case, we probably still need a deep understanding of agency to make sure we avoid building an agent by accident. Instrumental convergence may imply that all sufficiently powerful AI systems start looking like agents eventually, past a certain point. Though, when a particular system is best modeled as an agent may depend on the particulars of that system, and we may want to push that point out as far as possible.
Boiling this down to a single specific reason about why we should care about agency: the concept of agency is likely to be key for creating simple, predictively accurate models of many kinds of powerful AI systems, regardless of whether the builders of those systems:
- deeply understand the concepts of agency (or alignment) themselves
- are deliberately trying to build an agent, or deliberately trying to avoid that, or just trying to build the most powerful system as fast as possible, without explicitly trying to avoid or target agency at all. (We seem to be in a world in which different people are trying all three of these things simultaneously.)
A few arguments or stubs of arguments for why the bolded claim is correct and important:
- Agency is already a useful model of humans and human behavior in many situations.
- Agency is already a useful model of some current AI systems: Mu Zero, Dreamer, Stockfish in the domains of their respective game worlds. It might soon be a useful model of constructions like Auto-GPT, in the domain of the real world.
- The hypothesis that agency is instrumentally convergent means that it will be important in understanding all AI systems above a certain capability level.