Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.

Longer bio:


AI Alignment Writing Day 2019
AI Alignment Writing Day 2018

Wiki Contributions

Load More


The alignment problem in different capability regimes

There’s a related dynamic that came up in a convo I just had.

Alice: My current work is exploring if we can solve value loading using reward learning.

Bob: Woah, isn’t that obviously doomed? Didn’t Rohin write a whole sequence on this?

Alice: Well, I don’t want to solve the whole problem for arbitrary difficulty. I just want to know whether we can build something that gets the basics right in distributions that a present day human can understand. For example I reckon we may be able to teach an AI what murder is today, even if we can’t teach it what murder is post-singularity.

Bob: I see. That’s more reasonable. However, there is a school of thought that I suspect you may be a part of, called “slow takeoff”, that looks like over the course of maybe 10 years the world will increasingly be reliant on ML systems whose internals we don’t understand, and whose actions we don’t understand (we just see the metrics go up). It is world is already out of distribution for what a present day human can understand.

Alice: That’s true. I guess I’m interested to know how far we can push it and still build a system that helps us take pivotal acts and doesn’t destroy everything.

(The above is my paraphrase. Both said different things than the above and also I incorporated some things I said.)

The success story you have in mind determines what problem you’re trying to solve, and to some extent which capability regime you’re thinking about.

MIRI/OP exchange about decision theory

I was in the chat and don't have anything especially to "disclose". Joe and Nick are both academic philosophers who've studied at Oxford and been at FHI, with a wide range of interests. And Abram and Scott are naturally great people to chat about decision theory with when they're available.

Garrabrant and Shah on human modeling in AGI

What’s the second half of the versus in this section? It’s probably straightforward but I’d appreciate someone spelling it out.

Scott: And I'm basically distinguishing between a system that's learning how to do reasoning while being overseen and kept out of the convex hull of human modeling versus… And there are definitely trade-offs here, because you have more of a daemon problem or something if you're like, "I'm going to learn how to do reasoning," as opposed to, "I'm going to be told how to do reasoning from the humans." And so then you have to search over this richer space or something of how to do reasoning, which makes it harder.

Agency and the unreliable autonomous car

What is this, "A Series of Unfortunate Logical Events"? I laughed quite a bit, and enjoyed walking through the issues in self-knowledge that the löbstacle poses.

AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

Curated, in part for this episode, and also as a celebration of the whole series. I've listened to 6 out of the 9, and I've learned a great deal about people's work and their motivations for it. This episode in particular was excellent because I finally learned what a finite factored set was – your example of the Cartesian plane was really helpful! Which is a credit to your communication skills.

Basically every episode has been worthwhile and valuable for me, it's been easy to sit down with a researcher and hear them explain their research, and Daniel always brings thoughtful and on-point questions. I would personally be very gratified to see a future where AXRP has 10x the number of episodes, where for any key piece of AI x-risk research I can listen to the author talk it through for 1-2 hours. Please keep going!

Added: it's also great that you make transcripts, that's really valuable for a lot of people and searchability.

Musings on general systems alignment

That’s an inspiring narrative that rings true to me, I’m sure I will think on that framing more. Thank you.

Rogue AGI Embodies Valuable Intellectual Property

Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market,

Basic econ q here: I think that 2 competitors can often cut the profits by much more than half, because they can always undercut each other until they hit the cost of production. Especially if you're going from 1 seller to 2, I think that can shift a market from monopoly to not-a-monopoly, so I think it might be a lot less valuable.

Still, obviously likely to be worth it to the second company, so I totally expect the competition to happen.

Finite Factored Sets

Curated. This is a fascinating framework that (to the best of my understanding) makes substantive improvements on the Pearlian paradigm. It's also really exciting that you found a new simple sequence. 

Re: the writeup, it's explained very clearly, the Q&A interspersed is a very nice touch. I like that the talk factorizes.

I really appreciate the research exploration you do around ideas of agency and am very happy to celebrate the writeups like this when you produce them.

Load More