AI ALIGNMENT FORUM
AF

Background Image
Embedded Agency
Research AgendasEmbedded AgencyAI
Curated

45

Embedded Agents

by Abram Demski, Scott Garrabrant
29th Oct 2018
1 min read
42

45

Research AgendasEmbedded AgencyAI
Curated
Next:
Decision Theory
14 comments121 karma
Log in to save where you left off
Embedded Agents
8Rohin Shah
8Wei Dai
11Scott Garrabrant
4orthonormal
4johnswentworth
3David Manheim
2Ben Pace
New Comment
7 comments, sorted by
top scoring
Click to highlight new comments since: Today at 12:43 AM
[-]Rohin Shah6y80Nomination for 2018 Review

I actually have some understanding of what MIRI's Agent Foundations work is about

Reply
[-]Wei Dai7y80

I think it would be useful to give your sense of how Embedded Agency fits into the more general problem of AI Safety/Alignment. For example, what percentage of the AI Safety/Alignment problem you think Embedded Agency represents, and what are the other major chunks of the larger problem?

Reply
[-]Scott Garrabrant7y110

This is not a complete answer, but it is part of my picture:

(It is the part of the picture that I can give while being only descriptive, and not prescriptive. For epistemic hygiene reasons, I want avoid discussions of how much of different approaches we need in contexts (like this one) that would make me feel like I was justifying my research in a way that people might interpret as an official statement from the agent foundations team lead.)

I think that Embedded Agency is basically a refactoring of Agent Foundations in a way that gives one central curiosity based goalpost, rather than making it look like a bunch of independent problems. It is mostly all the same problems, but it was previously packaged as "Here are a bunch of things we wish we understood about aligning AI," and in repackaged as "Here is a central mystery of the universe, and here are a bunch things we don't understand about it." It is not a coincidence that they are the same problems, since they were generated in the first place by people paying close to what mysteries of the universe related to AI we haven't solved yet.

I think of Agent Foundations research has having a different type signature than most other AI Alignment research, in a way that looks kind of like Agent Foundations:other AI alignment::science:engineering. I think of AF as more forward-chaining and other stuff as more backward-chaining. This may seem backwards if you think about AF as reasoning about superintelligent agents, and other research programs as thinking about modern ML systems, but I think it is true. We are trying to build up a mountain of understanding, until we collect enough that the problem seems easier. Others are trying to make direct plans on what we need to do, see what is wrong with those plans, and try to fix the problems. Some consequences of this is that AF work is more likely to be helpful given long timelines, partially because AF is trying to be the start of a long journey of figuring things out, but also because AF is more likely to be robust to huge shifts in the field.

I actually like to draw an analogy with this: (taken from this post by Evan Hubinger)

I was talking with Scott Garrabrant late one night recently and he gave me the following problem: how do you get a fixed number of DFA-based robots to traverse an arbitrary maze (if the robots can locally communicate with each other)? My approach to this problem was to come up with and then try to falsify various possible solutions. I started with a hypothesis, threw it against counterexamples, fixed it to resolve the counterexamples, and iterated. If I could find a hypothesis which I could prove was unfalsifiable, then I'd be done.
When Scott noticed I was using this approach, he remarked on how different it was than what he was used to when doing math. Scott's approach, instead, was to just start proving all of the things he could about the system until he managed to prove that he had a solution. Thus, while I was working backwards by coming up with possible solutions, Scott was working forwards by expanding the scope of what he knew until he found the solution.

(I don't think it quite communicates my approach correctly, but I don't know how to do better.)

A consequence of the type signature of Agent Foundations is that my answer to "What are the other major chunks of the larger problem?" is "That is what I am trying to figure out."

Reply
[-]orthonormal6y40Review for 2018 Review

Insofar as the AI Alignment Forum is part of the Best-of-2018 Review, this post deserves to be included. It's the friendliest explanation to MIRI's research agenda (as of 2018) that currently exists.

Reply
[-]johnswentworth6y40Nomination for 2018 Review

This post (and the rest of the sequence) was the first time I had ever read something about AI alignment and thought that it was actually asking the right questions. It is not about a sub-problem, it is not about marginal improvements. Its goal is a gears-level understanding of agents, and it directly explains why that's hard. It's a list of everything which needs to be figured out in order to remove all the black boxes and Cartesian boundaries, and understand agents as well as we understand refrigerators.

Reply
[-]David Manheim6y30Nomination for 2018 Review

This post has significant changed my mental model of how to understand key challenges in AI safety, and also given me a clearer understanding of and language for describing why complex game-theoretic challenges are poorly specified or understood. The terms and concepts in this series of posts have become a key part of my basic intellectual toolkit.

Reply
[-]Ben Pace6y20Nomination for 2018 Review

This sequence was the first time I felt I understood MIRI's research.

(Though I might prefer to nominate the text-version that has the whole sequence in one post.)

Reply
Moderation Log
Curated and popular this week
7Comments

(A longer text-based version of this post is also available on MIRI's blog here, and the bibliography for the whole sequence can be found here)

Review by
orthonormal
Mentioned in
57Introduction to Cartesian Frames
49Selection Theorems: A Program For Understanding Agents
59A Shutdown Problem Proposal
50Welcome & FAQ!
32The Sweet Lesson: AI Safety Should Scale With Compute
Load More (5/25)