adamShimi's Shortform

by Adam Shimi22nd Jul 20201 comment
1 comments, sorted by Highlighting new comments since Today at 6:13 AM
New Comment

A month after writing my post on Focus as a crucial component of goal-directedness, I think I see things clearer about its real point. You can decompose the proposal in my post into two main ideas:

  • How much a system S is trying to accomplish a goal G can be captured by the distance of S to the set of policies maximally goal-directed towards G.
  • The set of policies maximally directed towards G is the set of policies trained by RL (for every amount of resource above a threshold) on the reward corresponding to G.

The first idea is what focus is really about. The second doesn't work as well, and multiple people pointed out issues with it. But I still find powerful the idea that focus on a goal mesures the similarity with a set of policy that only try to accomplish this goal.

Now the big question left is: can we define the set of policies maximally goal-directed towards G in a clean way that captures our intuitions?