# 8

Frontpage
AIRationality

When asked about what it means for a system to be goal-directed, one common answer draws on some version of Dennett’s intentional stance: a goal-directed system is a system such that modeling it as having a goal provides accurate and efficient predictions about its behavior. I agree up to that point. But then, some people follow up by saying that the prediction is that the system will accomplish its goal. For example, it makes sense to model AlphaGo as goal-directed towards winning at Go, because it will eventually win. And taking the intentional stance allows me to predict that.

But what if I make AlphaGo play against AlphaZero, which is strictly better at Go? Then AlphaGo will consistently lose. Does it mean that it’s no longer goal-directed towards winning?

What feels wrong to me is the implicit link drawn between goal-directedness and competence. A bad Go player will usually lose, but it doesn’t seem any less goal-directed to me than a stronger one that consistently wins.

Competence is thus not the whole story. It might be useful to compute goal-directedness; reaching some lower-bound of competency might even be a necessary condition for goal-directedness (play badly enough and it becomes debatable whether you're even trying to win). But when forcing together the two, I feel like something important is lost.

To solve this problem, I propose a new metric of goal-directedness, focus: how much is the system trying to accomplish a certain goal. Focus is not the whole story about being goal-directed, but I think computing the focus of a system for some goal (details in the next paragraph) gives useful information about its goal-directedness.

Given a system (as a function from states or histories to actions) and a goal (as a set of states), here are the steps to compute the focus of towards .

• I define a reward function over states valued 1 at states in and 0 at all other states.
• Then I define be the set of all policies that can be generated by Reinforcement Learning (RL) on . I’ll go into details about below, but the most important part here is that it isn't limited to optimal policies; I also consider policies of RL with “few resources”. Basically all policies at intermediary steps of the RL training are in .
• Lastly, I pick a distance between policies. If the two policies are deterministic, a Hamming distance will do. If they are stochastic, maybe some vector distance based on the Kullback-Leibler divergence.
• Then, the focus of towards is inversely proportional to the distance between and .

The intuition here is that any policy that result from training on this reward function is aiming maximally towards the goal, by definition. And by taking the appropriate distance, we can measure how far our system is from such a fully focused policy. The distance captures the proportion of actions taken by the policy that fits with aiming towards the specific goal.

Of course, there are many points that need further thought:

• What does “all policies given by using RL” mean in this case? The easy answer is all policies resulting from taking any RL method and any initial conditions, and training for any amount of resources on the reward function of the goal. But not only is this really, really uncomputable, I’m not sure it’s well defined enough what are “all methods of RL”?). Ideally, I would want to limit the study to one specific RL algorithm (SARSA for example) and then the set of generated policies would be well-defined. But I’m not sure if I’m losing any policy by doing so.
• Even when fixing some RL algorithm, it is completely unfeasible to consider all initial conditions and amounts of resources. Yet this is the obvious way to compute the set of maximally-focused policies. Here I hope for either a dense subset (or a good approximation) of this set of policies, or even an analytical characterization if ones exists.
• The ghost of competence strikes back here, because I cannot really consider any amount of resources; if I did, then every policy would be maximally-focused for the goal, as it would be generated by taking the policy as an initial condition and using no resources at all. My intuition for dealing with this is that there should be a meaningful lower bound on the amount of resources the RL algorithm has to use before the resulting policy is indeed maximally-focused. Maybe enough resource for all state values or state-action pairs value to have been updated at least once?

Finally, assuming we are able to compute the focus of any system for any goal, how to interpret the results we get? Focus is not divided between goals like probability: for example, the full goal consisting of all possible states always has maximal focus, as all policies are optimal for the corresponding reward; but other goals might also have the same focus. This entails that finding the most representative goal is not only about focus, but also about the triviality of the goal.

My far less clean intuition here is that the “triviality” of the goal should weight its focus. That is, the goal consisting of all possible states is trivial, whereas the one consisting of exactly one state is not trivial at all. Thus even if the former has stronger focus than the latter, it has to be really, really stronger to compensate its triviality. Or said another way, a non-trivial goal with a small but not negligible focus exhibits goal-directedness than a trivial goal with enormous focus.

Even with all those uncertainties, I still believe focus is a step in the right direction. It trims down competence to the part that seems the most relevant to goal-directedness. That being said, I am very interested in any weakness of the idea, or any competing intuition.

Thanks to Jérémy Perret for feedback on the writing, and to Joe Collman, Michele Campolo and Sabrina Tang for feedback on the idea.