Scott Garrabrant

Sequences

Finite Factored Sets
Cartesian Frames
Fixed Points

Wiki Contributions

Comments

Why Agent Foundations? An Overly Abstract Explanation

I mostly agree with this post.

Figuring out the True Name of a thing, a mathematical formulation sufficiently precise that one can apply lots of optimization pressure without the formulation breaking down, is absolutely possible and does happen.

Precision feels pretty far from the true name of the important feature of true names, I am not quite sure what precision means, but on one definition, precision is the opposite of generality, and true names seem anti-precise. I am not saying precision is not a virtue, and it does seem like precision is involved. (like precision on some meta level maybe?)

The second half about robustness to optimization pressure is much closer, but still not right. (I think it is a pretty direct consequence of true names.) It is clearly not yet a true name in the same way that "It is robust to people trying to push it" is the not the true name of inertia.

The Plan

I agree with this asymmetry. 

One thing I am confused about is whether to think of the e-coli as qualitatively different from the human. The e-coli is taking actions that can be well modeled by an optimization process searching for actions that would be good if this optimization process output them, which has some reflection in it. 

It feels like it can behaviorally be well modeled this way, but is mechanistically not shaped like this,  I feel like the mechanistic fact is more important, but I feel like we are much closer to having behavioral definitions of agency than mechanistic ones.

The Plan

Which isn't *that* large an update. The average number of agent foundations researchers (That are public facing enough that you can update on their lack of progress) at MIRI over the last decade is like 4.

Figuring out how to factor in researcher quality is hard, but it seems plausible to me that the amount of quality adjusted attention directed at your subgoal over the next decade is significantly larger than the amount of attention directed at your subgoal over the last decade. (Which would not all come from you. I do think that Agent Foundations today is non-trivially closer to John today that Agent Foundations 5 years ago is to John today.)

It seems accurate to me to say that Agent Foundations in 2014 was more focused on reflection, which shifted towards embeddedness, and then shifted towards abstraction, and that these things all flow together in my head, and so Scott thinking about abstraction will have more reflection mixed in than John thinking about abstraction. (Indeed, I think progress on abstraction would have huge consequences on how we think about reflection.)

In case it is not obvious to people reading, I endorse John's research program. (Which can maybe be inferred by the fact that I am arguing that it is similar to my own). I think we disagree about what is the most likely path after becoming less confused about agency, but that part of both our plans is yet to be written, and I think the subgoal is enough of a simple concept that I don't think disagreements about what to do next to have a strong impact on how to do the first step.

The Plan

To operationalize, I claim that MIRI has been directed at a close enough target to yours that you probably should update on MIRI's lack of progress at least as much as you would if MIRI was doing the same thing as you, but for half as long.

The Plan

Hmm, yeah, we might disagree about how much reflection(self-reference) is a central part of agency in general.

It seems plausible that it is important to distinguish between the e-coli and the human along a reflection axis (or even more so, distinguish between evolution and a human). Then maybe you are more focused on the general class of agents, and MIRI is more focused on the more specific class of "reflective agents."

Then, there is the question of whether reflection is going to be a central part of the path to (F/D)OOM.

Does this seem right to you?

The Plan

I want to disagree about MIRI. 

Mostly, I think that MIRI (or at least a significant subset of MIRI) has always been primarily directed at agenty systems in general.

I want to separate agent foundations at MIRI into three eras. The Eliezer Era (2001-2013), the Benya Era (2014-2016), and the Scott Era(2017-). 

The transitions between eras had an almost complete overhaul of the people involved. In spite of this, I believe that they have roughly all been directed at the same thing, and that John is directed at the same thing.

The proposed mechanism behind the similarity is not transfer, but instead because agency in general is a convergent/natural topic.

I think throughout time, there has always been a bias in the pipeline from ideas to papers towards being more about AI. I think this bias has gotten smaller over time, as the agent foundations research program both started having stable funding, and started carrying less and less of the weight of all of AI alignment on its back. (Before going through editing with Rob, I believe Embedded Agency had no mention of AI at all.)

I believe that John thinks that the Embedded Agency document is especially close to his agenda, so I will start with that. (I also think that both John and I currently have more focus on abstraction than what is in the Embedded Agency document).

Embedded Agency, more so than anything else I have done was generated using an IRL shaped research methodology. I started by taking the stuff that MIRI has already been working on, mostly the artifacts of the Benya Era, and trying to communicate the central justification that would cause one to be interested in these topics. I think that I did not invent a pattern, but instead described a preexisting pattern that originally generated the thoughts. 

This is consistent with having the pattern be about agency in general, and so I could find the pattern in ideas that were generated based on agency in AI, but I think this is not the case. I think the use of proof based systems is demonstrating an extreme disregard for the substrate that the agency is made of. I claim that the reason that there was a historic focus on proof-based agents, is because it is a system that we could actually say stuff about. The fact that real life agents looked very different of the surface from proof based agents was a shortfall that most people would use to completely reject the system, but MIRI would work in it because what they really cared about was agency in general, and having another system that is easy to say things about that could be used to triangulate agency in general. If MIRI was directed at a specific type of agency, they would have rejected the proof based systems as being too different.

I think that MIRI is often misrepresented as believing in GOFAI because people look at the proof based systems and think that MIRI would only study those if they thought that is what AI might look at. I think in fact, the reason for the proof based systems is because at the time, this was the most fruitful models we had, and we were just very willing to use any lens that worked when trying to look at something very very general.

(One counterpoint here, is maybe MIRI didn't care about the substrate the agency was running on, but did have a bias towards singleton-like agency, rather than very distributed systems, I think this is slightly true. Today, I think that you need to understand the distributed systems, because realistic singleton-like agents follow many of the same rules, but it is possible that early MIRI did not believe this as much)

Most of the above was generated by looking at the Benya Era, and trying to justify that it was directed at agency in general at least/almost as much as the Scott Era, which seems like the hardest of three for me. 

For the Scott Era, I have introspection. I sometimes stop thinking in general, and focus on AI. This is usually a bad idea, and doesn't generate as much fruit, and it is usually not what I do.

For the Eliezer Era, just look at the sequences. 

I just looked up and reread, and tried to steel man what you originally wrote. My best steel man is that you are saying that MIRI is trying the develop a prescriptive understanding of agency, and you are trying the develop a descriptive understanding of agency. There might be something to this, but it is really complicated. One way to define agency is as the pipeline from the prescriptive to the descriptive, so I am not sure that prescriptive and descriptive agency makes sense as a distinction.

As for the research methodology, I think that we all have pretty different research methodologies. I do not think Benya and Eliezer and I have especially more in common with each other than we do with John, but I might be wrong here. I also don't think Sam and Abram and Tsvi and I have especially more in common in terms of research methodologies, except in so far as we have been practicing working together. 

In fact, the thing that might be going on here is that the distinctions in topics is coming from differences in research skills. Maybe proof based systems are the most fruitful model if you are a Benya, but not if you are a Scott or a John. But this is about what is easiest for you to think about, not about a difference in the shared convergent subgoal of understanding agency in general.

Countably Factored Spaces

Note that the title is misleading. This is really countable dimension factored spaces, which is much better, since it allows for the possibility of something kind of like continuous time, where between any two points in time, you can specify a time strictly between them.

Finite Factored Sets: Inferring Time

Yeah, also note that the history of  given  is not actually a well defined concept. There is only the history of  given  for . You could define it to be the union of all of those, but that would not actually be used in the definition of orthogonality. In this case , and  are all independent of choice of , but in general, you should be careful about that.

Finite Factored Sets: Inferring Time

I think that works, I didn't look very hard. Yore histories of X given Y and V given Y are wrong, but it doesn't change the conclusion.

Load More