Richard Ngo

Former AI safety research engineer, now AI governance researcher at OpenAI. Blog:


Shaping safer goals
AGI safety from first principles

Wiki Contributions


Ngo and Yudkowsky on AI capability gains

My recommended policy in cases where this applies is "trust your intuitions and operate on the assumption that you're not a crackpot." 

Oh, certainly Eliezer should trust his intuitions and believe that he's not a crackpot. But I'm not arguing about what the person with the theory should believe, I'm arguing about what outside observers should believe, if they don't have enough time to fully download and evaluate the relevant intuitions. Asking the person with the theory to give evidence that their intuitions track reality isn't modest epistemology.

Ngo and Yudkowsky on AI capability gains

the easiest way to point out why they are dumb is with counterexamples. We can quickly "see" the counterexamples. E.g., if you're trying to see AGI as the next step in capitalism, you'll be able to find counterexamples where things become altogether different (misaligned AI killing everything; singleton that brings an end to the need to compete).

I'm not sure how this would actually work. The proponent of the AGI-capitalism analogy might say "ah yes, AGI killing everyone is another data point on the trend of capitalism becoming increasingly destructive". Or they might say (as Marx did) that capitalism contains the seeds of its own destruction. Or they might just deny that AGI will play out the way you claim, because their analogy to capitalism is more persuasive than your analogy to humans (or whatever other reasoning you're using). How do you then classify this as a counterexample rather than a "non-central (but still valid) manifestation of the theory"?

My broader point is that these types of theories are usually sufficiently flexible that they can "predict" most outcomes, which is why it's so important to pin them down by forcing them to make advance predictions.

On the rest of your comment, +1. I think that one of the weakest parts of Eliezer's argument was when he appealed to the difference between von Neumann and the village idiot in trying to explain why the next step above humans will be much more consequentialist than most humans (although unfortunately I failed to pursue this point much in the dialogue).

Ngo and Yudkowsky on AI capability gains

Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn't been tried. If anything, it's the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn't written anywhere near as extensively on object-level AI safety.

This has been valuable for community-building, but less so for making intellectual progress - because in almost all domains, the most important way to make progress is to grapple with many object-level problems, until you've developed very good intuitions for how those problems work. In the case of alignment, it's hard to learn things from grappling with most of these problems, because we don't have signals of when we're going in the right direction. Insofar as Eliezer has correct intuitions about when and why attempted solutions are wrong, those intuitions are important training data.

By contrast, trying to first agree on very high-level epistemological principles, and then do the object-level work, has a very poor track record. See how philosophy of science has done very little to improve how science works; and how reading the sequences doesn't improve people's object-level rationality very much.

I model you as having a strong tendency to abstract towards higher-level discussion of epistemology in order to understand things. (I also have a strong tendency to do this, but I think yours is significantly stronger than mine.) I expect that there's just a strong clash of intuitions here, which would be hard to resolve. But one prompt which might be useful: why aren't epistemologists making breakthroughs in all sorts of other domains?

Ngo and Yudkowsky on AI capability gains

I don't expect such a sequence to be particularly useful, compared with focusing on more object-level arguments. Eliezer says that the largest mistake he made in writing his original sequences was that he "didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory". Better, I expect, to correct the specific mistakes alignment researchers are currently making, until people have enough data points to generalise better.

Ngo and Yudkowsky on AI capability gains

it seems to me that you want properly to be asking "How do we know this empirical thing ends up looking like it's close to the abstraction?" and not "Can you show me that this abstraction is a very powerful one?"

I agree that "powerful" is probably not the best term here, so I'll stop using it going forward (note, though, that I didn't use it in my previous comment, which I endorse more than my claims in the original debate).

But before I ask "How do we know this empirical thing ends up looking like it's close to the abstraction?", I need to ask "Does the abstraction even make sense?" Because you have the abstraction in your head, and I don't, and so whenever you tell me that X is a (non-advance) prediction of your theory of consequentialism, I end up in a pretty similar epistemic state as if George Soros tells me that X is a prediction of the theory of reflexivity, or if a complexity theorist tells me that X is a prediction of the theory of self-organisation. The problem in those two cases is less that the abstraction is a bad fit for this specific domain, and more that the abstraction is not sufficiently well-defined (outside very special cases) to even be the type of thing that can robustly make predictions.

Perhaps another way of saying it is that they're not crisp/robust/coherent concepts (although I'm open to other terms, I don't think these ones are particularly good). And it would be useful for me to have evidence that the abstraction of consequentialism you're using is a crisper concept than Soros' theory of reflexivity or the theory of self-organisation. If you could explain the full abstraction to me, that'd be the most reliable way - but given the difficulties of doing so, my backup plan was to ask for impressive advance predictions, which are the type of evidence that I don't think Soros could come up with.

I also think that, when you talk about me being raised to hold certain standards of praiseworthiness, you're still ascribing too much modesty epistemology to me. I mainly care about novel predictions or applications insofar as they help me distinguish crisp abstractions from evocative metaphors. To me it's the same type of rationality technique as asking people to make bets, to help distinguish post-hoc confabulations from actual predictions.

Of course there's a social component to both, but that's not what I'm primarily interested in. And of course there's a strand of naive science-worship which thinks you have to follow the Rules in order to get anywhere, but I'd thank you to assume I'm at least making a more interesting error than that.

Lastly, on probability theory and Newtonian mechanics: I agree that you shouldn't question how much sense it makes to use calculus in the way that you described, but that's because the application of calculus to mechanics is so clearly-defined that it'd be very hard for the type of confusion I talked about above to sneak in. I'd put evolutionary theory halfway between them: it's partly a novel abstraction, and partly a novel empirical truth. And in this case I do think you have to be very careful in applying the core abstraction of evolution to things like cultural evolution, because it's easy to do so in a confused way.

Ngo and Yudkowsky on AI capability gains

I'm still trying to understand the scope of expected utility theory, so examples like this are very helpful! I'd need to think much more about it before I had a strong opinion about how much they support Eliezer's applications of the theory, though.

Ngo and Yudkowsky on AI capability gains

Not a problem. I share many of your frustrations about modesty epistemology and about most alignment research missing the point, so I sympathise with your wanting to express them.

On consequentialism: I imagine that it's pretty frustrating to keep having people misunderstand such an important concept, so thanks for trying to convey it. I currently feel like I have a reasonable outline of what you mean (e.g. to the level where I could generate an analogy about as good as Nate's laser analogy), but I still don't know whether the reason you find it much more compelling than I do is because you understand the details and implications better, or because you have different intuitions about how to treat high-level abstractions (compared with the intuitions I describe here).

At some point when I have a few spare days, I might try to write down my own best understanding of the concept, and try generate some of those useful analogies and intuition pumps, in the hope that explaining it from a different angle will prove fruitful. Until then, other people who try to do so should feel free to bounce drafts off me.

Ngo and Yudkowsky on AI capability gains

My model of Eliezer says that there is some deep underlying concept of consequentialism, of which the "not very coherent consequentialism" is a distorted reflection; and that this deep underlying concept is very closely related to expected utility theory. (I believe he said at one point that he started using the word "consequentialism" instead of "expected utility maximisation" mainly because people kept misunderstanding what he meant by the latter.)

I don't know enough about conservative vector fields to comment, but on priors I'm pretty skeptical of this being a good example of coherent utilities; I also don't have a good guess about what Eliezer would say here.

Ngo and Yudkowsky on AI capability gains

Thanks! I think that this is a very useful example of an advance prediction of utility theory; and that gathering more examples like this is one of the most promising way to make progress on bridging the gap between Eliezer's and most other people's understandings of consequentialism.

Ngo and Yudkowsky on AI capability gains

My objection is mostly fleshed out in my other comment. I'd just flag here that "In other words, you have to do things the "hard way"--no shortcuts" assigns the burden of proof in a way which I think is not usually helpful. You shouldn't believe my argument that I have a deep theory linking AGI and evolution unless I can explain some really compelling aspects of that theory. Because otherwise you'll also believe in the deep theory linking AGI and capitalism, and the one linking AGI and symbolic logic, and the one linking intelligence and ethics, and the one linking recursive self-improvement with cultural evolution, etc etc etc.

Now, I'm happy to agree that all of the links I just mentioned are useful lenses which help you understand AGI. But for utility theory to do the type of work Eliezer tries to make it do, it can't just be a useful lens - it has to be something much more fundamental. And that's what I don't think Eliezer's established.

Load More