Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.

If you'd like to talk to talk with me about your experience of the site, and let me ask you questions about it, book a conversation with me here: I'm currently available Thursday mornings, US West Coast Time (Berkeley, California).

Ben Pace's Comments

Challenges to Christiano’s capability amplification proposal

This post is close in my mind to Alex Zhu's post Paul's research agenda FAQ. They each helped to give me many new and interesting thoughts about alignment. 

This post was maybe the first time I'd seen a an actual conversation about Paul's work between two people who had deep disagreements in this area - where Paul wrote things, someone wrote an effort-post response, and Paul responded once again. Eliezer did it again in the comments of Alex's FAQ, which also was a big deal for me in terms of learning.

Clarifying "AI Alignment"

Nominating this primarily for Rohin’s comment on the post, which was very illuminating.

Optimization Amplifies

I think the simple mathematical models here are very helpful in pointing to some intuitions about being confident systems will work even with major optimisation pressure applied, and why optimisation power makes things weird. I would like to see other researchers in alignment review this post, because I don't fully trust my taste on posts like these.

The Steering Problem

Reading this post was the first time I felt I understood what Paul's (and many others') research was motivated by. I think about it regularly, and it comes up in conversation a fair bit.

Beyond Astronomical Waste

This is a post that's stayed with me since it was published. The title is especially helpful as a handle. It is a simple reference for this idea, that there are deeply confusing philosophical problems that are central to our ability to attain most of the value we care about (and that this might be a central concern when thinking about AI).

It's not been very close to areas I think about a lot, so I've not tried to build on it much, and would be interested in a review from someone who thinks in more detail about these matters more, but I expect they'll agree it's a very helpful post to exist.

Chris Olah’s views on AGI safety

Bouncing off a comment upthread, I wrote a post here with some general thoughts on security of powerful systems. I also gave an argument that marginal transparency doesn't translate into marginal security. I've mostly not figured out what its relation is to the OP, and below I'll just say that with a few more words. Note that I'm a layman when it comes to the details of ML, so am speaking like a 101 undergraduate or something like that.

It sounds to me like the transparency work being published at Distill, and some of its longer term visions (as outlined in the OP), if successful, are going to substantially increase how well we understand ML systems, and how useful they are.

From the perspective of security that I discuss in the linked post, I feel like I don’t personally have a good enough understanding of what Chris and others think their work is, to be able to tell whether the transparency work being published in Distill will (as it progresses) make systems secure in the way I described in the link, and it’s plausibly a much further effort on top. Evan mentions the idea of turning an ML system into an understandable codebase the size of the Linux Kernel, which sounds breathtakingly ambitious and incredibly useful. Though, for example, it's typically very hard to make a codebase secure when you've been handed a system that didn't have security built in.

Relatedly, I don't feel I have a good understanding of the folks at Clarity's model of why (or whether) AI will not go well by default (to pick a concrete possibility: if we reach AGI by largely continuing the current trajectory of work that the field of ML is doing), where any big risks come from (to pick some clear possibilities: whether it’s broadly similar to Paul’s first model, second model, or neither), and what sort of technical work is required to prevent the bad outcomes. I'd be interested if Chris or anyone else working alongside him on Clarity feels like they can offer a crisp claim about how optimisation enters the system and what features of a transparent system mean that it's findable and removable with an appropriate amount of resources (though I don’t think that’s an easy ask or anything).

Chris Olah’s views on AGI safety

I was going to write a comment here, but it got a bit long so I made a post instead.

Challenges to Christiano’s capability amplification proposal

Reading a description of Paul's ideas in another researchers words, who was attempting to sincerely explain why it didn't seem compelling to him, and reading Paul's replies, was really helpful in understanding Paul's research ideas, and I think about this post often (or at least, the ideas I formed reading it).

Load More