P₂B: Plan to P₂B Better

Daniel Kokotajlo

We take “planning” to include things that are relevantly similar to this procedure, such as following a bag of heuristics that approximates it.

In theory, optimal policies could be tabularly implemented. In this case, it is impossible for them to further improve their "planning." Yet optimal policies tend to seek power and pursue convergent instrumental subgoals, such as staying alive.

So I'm not (yet) convinced that this frame is useful reductionism for better understanding subgoals. It feels somewhat unnatural to me, although I am also getting a tad more S1-excited about the frame as I type this comment.

In particular, I think this is a great point:

Instrumental goals are about passing the buck: if you are a planner, and you can’t achieve your final goal with a single obvious action (or sequence of actions), you can instead pass the buck to something else, typically your future self. There will often be obvious available actions that put the receiver of the buck “closer” to achieving the final goal than you.

This at least rings true in my experience—when I don't know what to do for my research, I'll idle by "powering up" and reading more textbooks, delegating to my future self (and also allowing time for subconscious brainstorming).

[-]Daniel Kokotajlo4y40

I realized we forgot to put in the footnotes! There was one footnote which was pretty important, I'll put it here because it's related to what you said. It was a footnote after the "make the planners with your goal better at planning" sub-maxim.

This was an “aha” moment for me: Even such everyday actions as “briefly glance up from your phone so you can see where you are going when walking through a building” are instances of following this maxim! You are looking up from your phone so that you can acquire more relevant data (the location of the door, the location of the door handle, etc.) for your immediate-future-self to make use of. Your immediate-future-self will have a slightly better world-model as a result, and thus be better than you at making plans. In particular, your immediate future self will be able, e.g., to choose the correct moment & location to grab the door handle, by contrast with your present self who is looking at Twitter and does not know where to grab.

[-]Daniel Kokotajlo4y20

Footnotes are now added in (thanks Ramana!)

[-]adamShimi4y30

In theory, optimal policies could be tabularly implemented. In this case, it is impossible for them to further improve their "planning."

That sounds wrong. Planning as defined in this post is sufficiently broad that acting like a planner makes you a planner. So if you unwrap a structural planner into a tabular policy, the latter would improve its planning (for example by taking actions that instrumentally help it accomplish the goal we can best ascribe it using the intentional stance).

Another way of framing the point IMO is that the OPs define planning in terms of computation instead of algorithm, and so planning better means facilitating or making the following part of the computation more efficient.

[-]Steven Byrnes4y*60

How about "if I contain two subagents with different goals, they should execute Pareto-improving trades with each other"? This is an aspect of "becoming more rational", but it's not very well described by your maxim, because the maxim includes "your goal" as if that's well defined, right?

Unrelated topic: Maybe I didn't read carefully enough, but intuitively I treat "making a plan" and "executing a plan" as different, and I normally treat the word "planning" as referring just to the former, not the latter. Is that what you mean? Because executing a plan is obviously necessary too ....

[-]Daniel Kokotajlo4y30

Shooting from the hip: The maxim does include "your goal" as if that's well-defined, yeah. But this is fair, because this is a convergent instrumental goal; a system which doesn't have goals at all doesn't have convergent instrumental goals either. To put it another way: It's built into the definition of "planner" that there is a goal, a goal-like thing, something playing the role of goal, etc.

Anyhow, so I would venture to say that insofar as "my subagents should execute pareto-improving trades" does not in fact further my goal, then it's not convergently instrumental, and if it does further my goal, then it's a special case of self-improvement or rationality or some other shard of P2B.

Re point 2:

We take “planning” to include things that are relevantly similar to this procedure, such as following a bag of heuristics that approximates it. We’re also including actually following the plans, in what might more clunkily be called “planning-acting.”

[-]Martín Soto2y30

we have only said that P2B is the convergent instrumental goal. Whenever there are obvious actions that directly lead towards the goal, a planner should take them instead.

Hmm, given your general definition of planning, shouldn't it include realizations (and their corresponding guided actions) of the form "further thinking about this plan is worse than already acquiring some value now", so that P2B itself already includes acquiring the terminal goal (and optimizing solely for P2B is thus optimal)?

I guess your idea is "plan to P2B better" means "plan with the sole goal of improving P2B", so that it's a "non-value-laden" instrumental goal.

[-]adamShimi4y30

Your proposed reformulation of convergent subgoals sounds interesting, but I see a big flaw in your post: you don't even state the applications you're doing the deconfusion for. And in my book, the applications are THE way of judging whether deconfusion is creating valuable knowledge. So I don't know yet if your framing will help with the sort of problems related to agency and goal-directedness that I think matter.

Reserving judgment until the follow up posts then.

[-]Daniel Kokotajlo4y40

Fair enough; apologies. We are building to an answer to the question "What is agency and why is it powerful/competitive/incentivised/selected-for." We have a lot more to say on the subject but we decided to break it into pieces; this post is the first piece.

[-]adamShimi4y30

Exciting! Waiting for the next posts even more then.

[-]Daniel Kokotajlo4y20

Don't get your expectations too high, haha. We haven't written the other parts yet, maybe they won't turn out to be that good.

[-]Charlie Steiner4y10

Saying that resource acquisition is in the service of improved planning (because it makes future plans better) seems like a bit of a stretch - you could just as easily say that improved planning is in the service of resource acquisition (because it lets you use resources you couldn't before). "But executing plans is how you get the goal!" you might say, and "But using your resources is how you get to the goal!" is the reply.

Maybe this is nitpicking, because I agree with you that there is some central thing going on here that is the same whatever you choose to call "more fundamental." Some essence of getting to the goal, even though the world is bigger than me. So I'm looking forward to where this is headed.

[-]Vladimir_Nesov4y*10

more planners

This seems tenuous compared to "more planning substrate". Redundancy and effectiveness specifically through setting up a greater number of individual planners, even if coordinated, is likely an inferior plan. There are probably better uses of hardware that don't have this particular shape.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

22

P₂B: Plan to P₂B Better

22

One Convergent Instrumental Goal to Rule them All

Why P₂B Works

Objections & Nuances

Isn’t this trivial?

Most agents aren’t planners though?

What about the procrastination paradox?

Footnotes

22

P₂B: Plan to P₂B Better

22

One Convergent Instrumental Goal to Rule them All

Why P2B Works

Objections & Nuances

Isn’t this trivial?

Most agents aren’t planners though?

What about the procrastination paradox?

Footnotes

Why P₂B Works