This is not a well-specified question. I don't know what "agent-like behavior" or "agent-like architecture" should mean. Perhaps the question should be "Can you define the fuzzy terms such that 'Agent-like behavior implies agent-like architecture' is true, useful, and in the spirit of the original question." I mostly think the answer is no, but it seems like it would be really useful to know if true, and the process of trying to make this true might help us triangulate what we should mean by agent-like behavior and agent-like architecture.

Now I'll say some more to try to communicate the spirit of the original question. First a giant look-up table is not (directly) a counterexample. This is because it might be that the only way to produce an agent-like GLUT is to use agent-like architecture to search for it. Similarly a program that outputs all possible GLUTs is also not a counterexample because you might have to use your agent-like architecture to point at the specific counterexample. A longer version of the conjecture is "If you see a program implements agent-like behavior, there must some agent-like architecture in the program itself, in the causal history of the program, or in the process that brought your attention to the program." The pseudo-theorem I want is similar to the claim that correlation really does imply causation or the good regulator theorem.

One way of defining agent-like behavior as that which can only be produced by an agent-like architecture. This makes the theorem trivial, and the challenge is making the theorem non-vacuous. In this light, the question is something like "Is there some nonempty class of architectures that can reasonably be described as a subclass of 'agent-like' such that the class can be equivalently specified either functionally or syntactically?" This looks like it might conflict with the spirit of Rice's theorem, but I think making it probabilistic and referring to the entire causal history of the algorithm might give it a chance of working.

One possible way of defining agent-like architecture is something like "Has a world model and a goal, and searches over possible outputs to find one such that the model believes that output leads to the goal." Many words in this will have to be defined further. World model might be something that has high logical mutual information with the environment. It might be hard to define search generally enough to include everything that counts as search. There also might be completely different ways to define agent-like architecture. Do whatever makes the theorem true.

New Answer
New Comment

1 Answers sorted by



Conjecture: Every short proof of agentic behavior points out agentic architecture.

2 comments, sorted by Click to highlight new comments since:

Consider the Sphex wasp, doing the same thing in response to the same stimulus. Would you say that this is not an agent, or would you say that it is part of an agent, and that extended agent did search in a "world model" instantiated in the parts of the world inhabited by ancestral wasps?

At this point, if you allow "world model" to be literally anything with mutual information including other macroscopic situations in the world, and "search" to be any process that gives you information about outcomes, then yes, I think you can guarantee that, probabilistically, getting a specific outcome requires information about that outcome (no free lunch), which implies "search" on a "world model." As for goals, we can just ignore the apparent goals of the Sphex wasp and define a "real" agent (evolution) to have a goal defined by whatever informative process was at work (survival).

I think I do want to make my agent-like architecture general enough to include evolution. However, there might be a spectrum of agent-like-ness such that you can't get much more than Sphex behavior with just evolution (without having a mesa-optimizer in there)

I think you can guarantee that, probabilistically, getting a specific outcome requires information about that outcome (no free lunch), which implies "search" on a "world model."

Yeah, but do you think you can make it feel more like a formal proof?