Hierarchical planning: context agents

[-]Steven Byrnes5y*50

I think maybe a more powerful framework than discrete contexts is that there's a giant soup of models, and the models have arrows pointing at other models, and multiple models can be active simultaneously, and the models can span different time scales. So you can have a "I am in the store" model, and it's active for the whole time you're shopping, and meanwhile there are faster models like "I am looking for noodles", and slower models like "go shopping then take the bus home". And anything can point to anything else. So then if you have a group of models that mainly point to each other, and less to other stuff, it's a bit of an island in the graph, and you can call it a "context". Like everything I know about chess strategy is mostly isolated from the rest of my universe of knowledge and ideas, so I could say I have a "chess strategy context". But that's an emergent property, not part of the data structure.

My impression is that the Goodhart's law thing at the end is a bit like saying "Don't think creatively"... Thinking creatively is making new connections where they don't immediately pop into your head. Is that reasonable? Sorry if I'm misunderstanding. :)

[-]Charlie Steiner5y30

Yeah, I agree, it seems both more human-like and more powerful to have a dynamical system where models are activating other models based on something like the "lock and key" matching of neural attention. But for alignment purposes, it seems to me that we need to not only optimize models for usefulness or similarity to actual human thought, but also for how similar they are to how humans think of human thought - when we imagine an AI with the goal of doing good, we want it to have decision-making that matches our understanding of "doing good." The model in this post isn't as neat and clean as utility maximization, but a lot of the overly-neat features have to do with making it more convenient to talk about it having a fixed, human-comprehensible goal.

Re: creativity, I see how you'd get that from what I wrote but I think that's only half right. The model laid out in this post is perfectly capable of designing new solutions to problems - it just tends to do it by making a deliberate choice to take a "design a new solution" action. Another source of creativity is finding surprising solutions to difficult search problems, which is perfectly possible in complicated contexts.

Another source of creativity is compositionality, which you can have in this formalism by attributing it to the transition function putting you ino to a composed context. Can you learn this while trying to mimic humans? I'm not sure, but it seems possible.

We might also attribute a deficit in creativity to the fact that the reward functions are only valid in-context, and aren't designed to generalize to new states, even if there were really apt ways of thinking about the world that involved novel contexts or adding new states to existing contexts. And maybe this is the important part, because I think this is a key feature, not at all a bug.

[-]xuan5y*30

In exchange for the mess, we get a lot closer to the structure of what humans think when they imagine the goal of "doing good." Humans strive towards such abstract goals by having a vague notion of what it would look and feel like, and by breaking down those goals into more concrete sub-tasks. This encodes a pattern of preferences over universe-histories that treats some temporally extended patterns as "states."

Thank you for writing this post! I've had very similar thoughts for the past year or so, and I think the quote above is exactly right. IMO, part of the alignment problem involves representational alignment -- i.e., ensuring that AI systems accurately model both the abstract concepts we use to understand the world, as well as the abstract tasks, goals, and "reasons for acting" that humans take as instrumental or final ends. Perhap's you're already familiar with Bratman's work on Intentions, Plans, & Practical Reason, but to the the extent that "intentions" feature heavily in human mental life as the reasons we cite for why we do things, developing AI models of human intention feels very important.

As it happens, one of the next research projects I'll be embarking on is modeling humans as hierarchical planners (most likely in the vein of Hierarchical Task & Motion Planning in the Now by Kaelbling & Lozano-Perez) in order to do Bayesian inference over their goals and sub-goals -- would be happy to chat more about it if you'd like!

[-]Charlie Steiner5y30

Oh wait, are you the first author on this paper? I didn't make the connection until I got around to reading your recent post.

So when you talk about moving to a hierarchical human model, how practical do you think it is to also move to a higher-dimensional space of possible human-models, rather than using a few hand-crafted goals? This necessitates some loss function or prior probability over models, and I'm not sure how many orders of magnitude more computationally expensive it makes everything.

[-]xuan5y10

Yup! And yeah I think those are open research questions -- inference over certain kinds of non-parametric Bayesian models is tractable, but not in general. What makes me optimistic is that humans in similar cultures have similar priors over vast spaces of goals, and seem to do inference over that vast space in a fairly tractable manner. I think things get harder when you can't assume shared priors over goal structure or task structure, both for humans and machines.

[-]Charlie Steiner5y10

Sorry for being slow :) No, I haven't read anything of Bratman's. Should I? The synopsis looks like it might have some interesting ideas but I'm worried he could get bogged down in what human planning "really is" rather than what models are useful.

I'd totally be happy to chat either here or in PMs. Full Bayesian reasoning seems tricky if the environment is complicated enough to make hierarchical planning attractive - or do you mean optimizing a model for posterior probability (the prior being something like MML?) by local search?

I think one interesting question there is if it can learn human foibles. For example, suppose we're playing a racing game and I want to win the race, but fail because my driving skills are bad. How diverse a dataset about me do you need to actually be able to infer that a) I am capable of conceptualizing how good my performance is b) I wanted it to be good c) It wasn't good, from a hierarchical perpective, because of the lower-level planning faculties I have. I think maybe you could actually learn this only from racing game data (no need to make an AGI that can ask me about my goals and do top-down inference), so long as you had diverse enough driving data to make the "bottom-up" generalization that my low-level driving skill can be modeled as bad almost no matter the higher-level goal, and therefore it's simplest to explain me not winning a race by taking the bad driving I display elsewhere as a given and asking what simple higher-level goal fits on top.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

10

Hierarchical planning: context agents

10

I - intro

II - description

III - more specific description

IV - motivation