All of xuan's Comments + Replies

Building brain-inspired AGI is infinitely easier than understanding the brain

This was a great read! I wonder how much you're committed to "brain-inspired" vs "mind-inspired" AGI, given that the approach to "understanding the human brain" you outline seems to correspond to Marr's computational and algorithmic levels of analysis, as opposed to the implementational level (see link for reference). In which case, some would argue, you don't necessarily have to do too much neuroscience to reverse engineer human intelligence. A lot can be gleaned by doing classic psychological experiments to validate the functional roles of various aspect... (read more)

2Steve Byrnes1y
Thanks! I guess my feeling is that we have a lot of good implementation-level ideas (and keep getting more), and we have a bunch of algorithm ideas, and psychology ideas and introspection and evolution and so on, and we keep piecing all these things together, across all the different levels, into coherent stories, and that's the approach I think will (if continued) lead to AGI. Like, I am in fact very interested in "methods for fast and approximate Bayesian inference" as being relevant for neuroscience and AGI, but I wasn't really interested in it until I learned bunch of supporting ideas about what part of the brain is doing that, and how it works on the neuron level, and how and when and why that particular capability evolved in that part of the brain. Maybe that's just me. I haven't seen compelling (to me) examples of people going successfully from psychology to algorithms without stopping to consider anything whatsoever about how the brain is constructed . Hmm, maybe very early Steve Grossberg stuff? But he talks about the brain constantly now. One reason it's tricky to make sense of psychology data on its own, I think, is the interplay between (1) learning algorithms, (2) learned content (a.k.a. "trained models"), (3) innate hardwired behaviors (mainly in the brainstem & hypothalamus). What you especially want for AGI is to learn about #1, but experiments on adults are dominated by #2, and experiments on infants are dominated by #3, I think.
Hierarchical planning: context agents

Yup! And yeah I think those are open research questions -- inference over certain kinds of non-parametric Bayesian models is tractable, but not in general. What makes me optimistic is that humans in similar cultures have similar priors over vast spaces of goals, and seem to do inference over that vast space in a fairly tractable manner. I think things get harder when you can't assume shared priors over goal structure or task structure, both for humans and machines.

The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

Belatedly reading this and have a lot of thoughts about the connection between this issue and robustness to ontological shifts (which I've written a bit about here), but I wanted to share a paper which takes a very small step in addressing some of these questions by detecting when the human's world model may diverge from a robot's world model, and using that as an explanation for why a human might seem to be acting in strange or counter-productive ways:

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior
Siddharth Reddy, Anca D.

... (read more)
Writing Causal Models Like We Write Programs

Belatedly seeing this post, but I wanted to note that probabilistic programming languages (PPLs) are centered around this basic idea! Some useful links and introductions to PPLs as a whole:
- Probabilistic models of cognition (web book)
- WebPPL
- An introduction to models in Pyro
- Introduction to Modeling in Gen

And here's a really fascinating paper by some of my colleagues that tries to model causal interventions that go beyond Pearl's do-operator, by formalizing causal interventions as (probabilistic) program transformations:

Bayesian causal inference via pr

... (read more)
AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy

Replying to the specific comments:

This still seems like a fair way to evaluate what the alignment community thinks about, but I think it is going to overestimate how parochial the community is. For example, if you go by "what does Stuart Russell think is important", I expect you get a very different view on the field, much of which won't be in the Alignment Newsletter.

I agree. I intended to gesture a little bit at this when I mentioned that "Until more recently, It’s also been excluded and not taken very seriously within traditional academia", because I th... (read more)

5Rohin Shah1y
Re: worries about "reward", I don't feel like I have a great understanding of what your worry is, but I'd try to summarize it as "while the abstraction of reward is technically sufficiently expressive, 1) it may not have the right inductive biases, and so the framework might fail in practice, and 2) it is not a good framework for thought, because it doesn't sufficiently emphasize many important concepts like logic and hierarchical planning". I think I broadly agree with those points if our plan is to explicitly learn human values, but it seems less relevant when we aren't trying to do that and are instead trying to In this framework, "knowledge about what humans want" doesn't come from a reward function, it comes from something like GPT-3 pretraining. The AI system can "invent" whatever concepts are best for representing its knowledge, which includes what humans want. Here, reward functions should instead be thought of as akin to loss functions -- they are ways of incentivizing particular kinds of outputs. I think it's reasonable to think on priors that this wouldn't be sufficient to get logical / hierarchical behavior, but I think GPT and AlphaStar and all the other recent successes should make you rethink that judgment. ---- I agree that trend-following behavior exists. I agree that this means that work on deep learning is less promising than you might otherwise think. That doesn't mean it's the wrong decision; if there are a hundred other plausible directions, it can still be the case that it's better to bet on deep learning rather than try your hand at guessing which paradigm will become dominant next. To quote Rodney Brooks []: He also predicts that the "next big thing" will happen by 2027 (though I get the sense that he might count new kinds of deep learning architectures as a "big thing" so he may not be predicting something as paradigm-shifting as you're thinking). Whether to diversify dep
AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy

Thanks for this summary. Just a few things I would change:

  1. "Deep learning" instead of "deep reinforcement learning" at the end of the 1st paragraph -- this is what I meant to say, and I'll update the original post accordingly.
  2. I'd replace "nice" with "right" in the 2nd paragraph.
  3. "certain interpretations of Confucian philosophy" instead of "Confucian philosophy", "the dominant approach in Western philosophy" instead of "Western philosophy" -- I think it's important not to give the impression that either of these is a monolith.
3Rohin Shah1y
Done :)
AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy

Thanks for these thoughts! I'll respond to your disagreement with the framework here, and to the specific comments in a separate reply.

First, with respect to my view about the sources of AI risk, the characterization you've put forth isn't quite accurate (though it's a fair guess, since I wasn't very explicit about it). In particular:

  1. These days I'm actually more worried by structural risks and multi-multi alignment risks, which may be better addressed by AI governance than technical research per se. If we do reach super-intelligence, I think it's more like
... (read more)
5Rohin Shah1y
I agree with you on 1 and 2 (and am perhaps more optimistic about not building globally optimizing agents; I actually see that as the "default" outcome). I think this is where I disagree. I'd offer two main reasons not to believe this: 1. Children learn to follow common sense, despite not having (explicit) meta-ethical and meta-normative beliefs at all. (Though you could argue that the relevant meta-ethical and meta-normative concepts are inherent in / embedded in / compiled into the human brain's "priors" and learning algorithm.) 2. Intuitively, it seems like sufficiently good imitations of humans would have to have (perhaps implicit) knowledge of "common sense". We can see this to some extent, where GPT-3 demonstrates implicit knowledge of at least some aspects of common sense (though I do not claim that it acts in accordance with common sense). (As a sanity check, we can see that neither of these arguments would apply to the "learning human values" case.) I'm going to assume that Quality Y is "normative" if determining whether an object X has quality Y depends on who is evaluating Y. Put another way, an independent race of aliens that had never encountered humans would probably not converge to the same judgments as we do about quality Y. This feels similar to the is-ought distinction: you cannot determine "ought" facts from "is" facts, because "ought" facts are normative, whereas "is" facts are not (though perhaps you disagree with the latter). I think "common sense is normative" is sufficient to argue that a race of aliens could not build an AI system that had our common sense, without either the aliens or the AI system figuring out the right meta-normative concepts for humanity (which they presumably could not do without encountering humans first). I don't see why it implies that we cannot build an AI system that has our common sense. Even if our common sense is normative, its effects are widespread; it should be possib
Hierarchical planning: context agents

In exchange for the mess, we get a lot closer to the structure of what humans think when they imagine the goal of "doing good." Humans strive towards such abstract goals by having a vague notion of what it would look and feel like, and by breaking down those goals into more concrete sub-tasks. This encodes a pattern of preferences over universe-histories that treats some temporally extended patterns as "states."

Thank you for writing this post! I've had very similar thoughts for the past year or so, and I think the quote above is exactly right. IMO, part of... (read more)

3Charlie Steiner1y
Oh wait, are you the first author on this paper []? I didn't make the connection until I got around to reading your recent post. So when you talk about moving to a hierarchical human model, how practical do you think it is to also move to a higher-dimensional space of possible human-models, rather than using a few hand-crafted goals? This necessitates some loss function or prior probability over models, and I'm not sure how many orders of magnitude more computationally expensive it makes everything.
1Charlie Steiner1y
Sorry for being slow :) No, I haven't read anything of Bratman's. Should I? The synopsis looks like it might have some interesting ideas but I'm worried he could get bogged down in what human planning "really is" rather than what models are useful. I'd totally be happy to chat either here or in PMs. Full Bayesian reasoning seems tricky if the environment is complicated enough to make hierarchical planning attractive - or do you mean optimizing a model for posterior probability (the prior being something like MML?) by local search? I think one interesting question there is if it can learn human foibles. For example, suppose we're playing a racing game and I want to win the race, but fail because my driving skills are bad. How diverse a dataset about me do you need to actually be able to infer that a) I am capable of conceptualizing how good my performance is b) I wanted it to be good c) It wasn't good, from a hierarchical perpective, because of the lower-level planning faculties I have. I think maybe you could actually learn this only from racing game data (no need to make an AGI that can ask me about my goals and do top-down inference), so long as you had diverse enough driving data to make the "bottom-up" generalization that my low-level driving skill can be modeled as bad almost no matter the higher-level goal, and therefore it's simplest to explain me not winning a race by taking the bad driving I display elsewhere as a given and asking what simple higher-level goal fits on top.
Focus: you are allowed to be bad at accomplishing your goals

Thanks for writing up this post! It's really similar in spirit to some research I've been working on with others, which you can find on the ArXiv here: We also model bounded goal-directed agents by assuming that the agent is running some algorithm given bounded compute, but our approach differs in the following ways:

  • We don't attempt to compute full policies over the state space, since this is generally intractable, and also cognitively implausible, at least for agents like ourselves. Instead, we compute (par
... (read more)
2Adam Shimi2y
Sorry for the delay in answering. Your paper looks great! It seems to tackle in a clean and formal way what I was vaguely pointing at. We're currently reading a lot of papers and blog posts to prepare for an in-depth literature review about goal-directedness, and I added your paper to the list. I'll try to come back here and comment after I read it.