The idea that "Agents are systems that would adapt their policy if their actions influenced the world in a different way." works well on mechanised CIDs whose variables are neatly divided into object-level and mechanism nodes: we simply check for a path from a utility function F_U to a policy Pi_D. But to apply this to a physical system, we would need a way to obtain such a partition those variables. Specifically, we need to know (1) what counts as a policy, and (2) whether any of its antecedents count as representations of "influence" on the world (and after all, antecedents A of the policy can only be 'representations' of the influence, because in the real world, the agent's actions cannot influence themselves by some D->A->Pi->D loop). Does a spinal reflex count as a policy? Does an ant's decision to fight come from a representation of a desire to save its queen? How accurate does its belief about the forthcoming battle have to be before this representation counts? I'm not sure the paper answers these questions formally, nor am I sure that it's even possible to do so. These questions don't seem to have objectively right or wrong answers.

So we don't really have any full procedure for "identifying agents". I do think we gain some conceptual clarity. But on my reading, this clear definition serves to crystallise how hard it is to identify agents, moreso than it shows practically how it can be done.

(NB. I read this paper months ago, so apologies if I've got any of the details wrong.)

Reply

[-]tom4everitt3y20

The idea ... works well on mechanised CIDs whose variables are neatly divided into object-level and mechanism nodes. ... But to apply this to a physical system, we would need a way to obtain such a partition those variables

Agree, the formalism relies on a division of variable. One thing that I think we should perhaps have highlighted much more is Appendix B in the paper, which shows how you get a natural partition of the variables from just knowing the object-level variables of a repeated game.

Does a spinal reflex count as a policy?

A spinal reflex would be different if humans had evolved in a different world. So it reflects an agentic decision by evolution. In this sense, it is similar to the thermostat, which inherits its agency from the humans that designed it.

Does an ant's decision to fight come from a representation of a desire to save its queen?

Same as above.

How accurate does its belief about the forthcoming battle have to be before this representation counts?

One thing that I'm excited about to think further about is what we might call "proper agents", that are agentic in themselves, rather than just inheriting their agency from the evolution / design / training process that made them. I think this is what you're pointing at with the ant's knowledge. Likely it wouldn't quite be a proper agent (but a human would, as we are able to adapt without re-evolving in a new environment). I have some half-developed thoughts on this.

Reply

[-]Algon3y20

How computantially expensive is this to implement? When I looked into CID a couple of years ago, figuring out a causal graph for an agent/environment was quite costly, which would make adoption harder.

Reply

[-]zac_kenton3y20

I haven't considered this in great detail, but if there are variables, then I think the causal discovery runtime is $O (N^{2})$ . As we mention in the paper (footnote 5) there may be more efficient causal discovery algorithms that make use of certain assumptions about the system.

On adoption, perhaps if one encounters a situation where the computational cost is too high, one could coarse-grain their variables to reduce the number of variables. I don't have results on this at the moment but I expect that the presence of agency (none, or some) is robust to the coarse-graining, though the exact number of agents is not (example 4.3), nor are the variables identified as decisions/utilities (Appendix C).

Reply

[-]tom4everitt3y17

The way I see it, the primary value of this work (as well as other CID work) is conceptual clarification. Causality is a really fundamental concept, which many other AI-safety relevant concepts build on (influence, response, incentives, agency, ...). The primary aim is to clarify the relationships between concepts and to derive relevant implications. Whether there are practical causal inference algorithms or not is almost irrelevant.

TLDR: Causality > Causal inference :)

Reply

Moderation Log