Many people believe that understanding "agency" is crucial for alignment, but as far as I know, there isn't a canonical list of reasons why we care about agency. Please describe any reasons why we might care about the concept of agency for understanding alignment below. If you have multiple reasons, please list them in separate answers below.
Please also try to be specific as possible about what our goal is in the scenario. For example:
We want to know what an agent is so that we can determine whether or not a given AI is a dangerous agent
Whilst useful isn't quite as good as:
We have an AI which may or may not have goals aligned with us and we want to know how dangerous it would be if it weren't aligned. We want to use interpretability tools to determine the extent to which it will pursue instrumental incentives and this would be easier if we knew exactly what we were looking for.
In particular, we can set up a scale with agents which have only learned to pursue instrumental incentives based upon a few heuristics they learned during training one one end and an agent which maximises its goals during deployment on the other (the latter is seen as more agentic). We would like the ability to figure out where a particular agent is on this scale just by looking at its weights and we'd ideally like a general technique rather than a technique specific to one kind of unwanted instrumental incentive. In this context, we care about finding a definition of agency that helps us construct a scale that roughly corresponds to how worried we should be about a system having instrumental incentives.
In a few days, I'll add any use cases I'm aware of myself that either haven't been covered or that I don't think have been adequately explained by different answers.
Summary: John describes the problems of inner and outer alignment. He also describes the concept of True Names - mathematical formalisations that hold up under optimisation pressure. He suggests that having a "True Name" for optimizers would be useful if we wanted to inspect a trained system for an inner optimiser and not risk missing something.
He further suggests that the concept of agency breaks down into lower-level components like "optimisation", "goals", "world models", ect. It would be possible to make further arguments about how these lower-level concepts are important for AI safety.