Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.

Longer bio:


AI Alignment Writing Day 2019

AI Alignment Writing Day 2018


AI Research Considerations for Human Existential Safety (ARCHES)

Wow, this is long, and seems pretty detailed and interesting. I'd love to see someone write a selection of key quotes or a summary.

The ground of optimization

On the topic of the black hole...

There’s a way of viewing the world as a series of ”forces”, each trying to control the future. Eukaryotic life is one. Black holes are another. We build many things, humans, from chairs to planes to AIs. Of those three, turning on the AI feels the most like “a new force has entered the game”. 

All these forces are fighting over the future, and while it’s odd to think of a black hole as an agent, sometimes when I look at it it does feel natural to think of physics as another optimisation force that’s playing the game with us.

The ground of optimization

Curated. Come on dude, stop writing so many awesome posts so quickly, it's too much.

This is a central question in the science of agency and optimization. The proposal is simple, you connected it to other ideas from Drexler and Demski+Garrabrant, and you gave a ton of examples of how to apply the idea. I generally get scared by the academic style, worried that the authors will fill out the text and make it really hard to read, but this was all highly readable, and set its own context (re-explaining the basic ideas at the start). I'm looking forward to you discussing it in the comments with Ricraz, Rohin and John.

Please keep writing these posts!

Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

Thanks for the post, it is a helpful disjunction of possibilities and set of links to prior discussion.

I think that the post would be clearer if instead of sections called "Why I think we might be in this world" it had section with the same content called "Links to where people have discussed being in this world" or something similar. I'm not really sure why you use the title you do, it threw me for a bit.

What are the high-level approaches to AI alignment?

I'm not actually sure what you mean. I think 'seed AI' means something like 'first case in an iterative/recursive process' of self-improvement, which applies pretty well to the iterated amplification setup (which is a recursively self-improving AI) and lots of other examples that Evan wrote about in his 11-examples post. It still seems to me to be a pretty general term.

What are the high-level approaches to AI alignment?

I think the last one seems odd / doesn't make much sense. All agents have a decision theory, including RL-based agents, so it's not a distinctive approach. 

If you were attempting to describe MIRI's work, remember that they're trying to understand basic concepts of agency better (meta level, object level), not in order to directly put the new concepts into the AI (in the same way current AIs do not always have a section for the 'probability theory' to be written in) but in order to be much less confused about what we're trying to do.

So if you want to describe MIRI's work, you'd call it "getting less confused about the basic concepts" and then later building an AI via a mechanism we'll figure out after getting less confused. Right now it's not engineering, it's science.

Possible takeaways from the coronavirus pandemic for slow AI takeoff

Some good points, but on the contrary: a slow take-off is considered safer because we have more lead time and warning shots, but the world has seen many similar events and warning shots for covid. Ones that come to mind in the last two decades are swine flu, bird flu, and Ebola, and of course there have been many more over history.

This just isn’t that novel or surprising, billionaires like Bill Gates have been sounding the alarm, and still the supermajority of Western countries failed to take basic preventative measures. Those properties seem similar to even the slow take-off scenario. I feel like the fast-takeoff analogy would go through most strongly in a world where we'd just never seen this sort of pandemic before, but in reality we've seen many of them.

Possible takeaways from the coronavirus pandemic for slow AI takeoff

Thinking for a minute, I guess my unconditional probability of unaligned AI ending civilization (or something similar) is around 75%. It’s my default expected outcome.

That said, this isn’t a number I try to estimate directly very much, and I’m not sure if it would be the same after an hour of thinking about that number. Though I’d be surprised if I ended up giving more than 95% or less than 40%. 

Curious where yours is at?

Load More