AI ALIGNMENT FORUM
AF

Embedded Agency
Embedded AgencyGoodhart's LawResearch AgendasRobust AgentsThe Pointers ProblemValue LearningAI
Frontpage

33

Robust Delegation

by Abram Demski, Scott Garrabrant
4th Nov 2018
1 min read
10

33

Embedded AgencyGoodhart's LawResearch AgendasRobust AgentsThe Pointers ProblemValue LearningAI
Frontpage
Previous:
Embedded World-Models
5 comments96 karma
Next:
Subsystem Alignment
3 comments102 karma
Log in to save where you left off
Robust Delegation
7Scott Garrabrant
4David Manheim
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 9:44 PM
[-]Scott Garrabrant7y70

Some last minute emphasis:

We kind of open with how agents have to grow and learn and be stable, but talk most of the time about this two agent problem, where there is an initial agent and a successor agent. When thinking about it as the succession problem, it seems like a bit of a stretch as a fundamental part of agency. The first two sections were about how agents have to make decisions and have models, and choosing a successor does not seem like as much of a fundamental part of agency. However, when you think it as an agent has to stably continue to optimize over time, it seems a lot more fundamental.

So, I want to emphasize that when we say there are multiple forms of the problem, like choosing successors or learning/growing over time, the view in which these are different at all is a dualistic view. To an embedded agent, the future self is not privileged, it is just another part of the environment, so there is no difference between making a successor and preserving your own goals.

It feels very different to humans. This is because it is much easier for us to change ourselves over time that it is to make a clone of ourselves and change the clone, but that difference is not fundamental.

Reply
[-]David Manheim7y40

I want to expand a bit on adversarial Goodhart, which this post describes as when another agent actively attempts to make the metric fail, and the paper I wrote with Scott split into several sub-categories, but which I now think of in somewhat simpler terms. There is nothing special happening in the multi-agent setting in terms of metrics or models, it's the same three failure modes we see in the single agent case.

What changes more fundamentally is that there are now coordination problems, resource contention, and game-theoretic dynamics that make the problem potentially much worse in practice. I'm beginning to think of these multi-agent issues as a problem more closely related to the other parts of embedded agency - needing small models of complex systems, reflexive consistency, and needing self-models, as well as the issues less intrinsically about embedded agency, of coordination problems and game theoretic competition.

Reply
Moderation Log
Curated and popular this week
2Comments

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here)

Mentioned in
27Problem relaxation as a tactic
29Reframing Impact
22Subagents of Cartesian Frames
25Does Bayes Beat Goodhart?
18Embedded Agency via Abstraction
Load More (5/11)