All of Ben Pace's Comments + Replies

Agency and the unreliable autonomous car

What is this, "A Series of Unfortunate Logical Events"? I laughed quite a bit, and enjoyed walking through the issues in self-knowledge that the löbstacle poses.

AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

Curated, in part for this episode, and also as a celebration of the whole series. I've listened to 6 out of the 9, and I've learned a great deal about people's work and their motivations for it. This episode in particular was excellent because I finally learned what a finite factored set was – your example of the Cartesian plane was really helpful! Which is a credit to your communication skills.

Basically every episode has been worthwhile and valuable for me, it's been easy to sit down with a researcher and hear them explain their research, and Daniel alway... (read more)

I'm glad to hear that the podcast is useful for people :)

Musings on general systems alignment

That’s an inspiring narrative that rings true to me, I’m sure I will think on that framing more. Thank you.

Rogue AGI Embodies Valuable Intellectual Property

Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market,

Basic econ q here: I think that 2 competitors can often cut the profits by much more than half, because they can always undercut each other until they hit the cost of production. Especially if you're going from 1 seller to 2, I think that can shift a market from monopoly to not-a-monopoly, so I think it might be a lot less valuable.

Still, obviously likely to be worth it to the second company, so I totally expect the competition to happen.

1Mark Xu1moYeah, I'm really not sure how the monopoly -> non-monopoly dynamics play out in practice. In theory, perfect competition should drive the cost to the cost of marginal production, which is very low for software. I briefly tried getting empirical data for this, but couldn't find it, plausibly since I didn't really know the right search terms.
Finite Factored Sets

Curated. This is a fascinating framework that (to the best of my understanding) makes substantive improvements on the Pearlian paradigm. It's also really exciting that you found a new simple sequence. 

Re: the writeup, it's explained very clearly, the Q&A interspersed is a very nice touch. I like that the talk factorizes.

I really appreciate the research exploration you do around ideas of agency and am very happy to celebrate the writeups like this when you produce them.

Knowledge is not just map/territory resemblance

The original lesswrong 1.0 had the following header at the top of each page, pointing at a certain concept of map/territory resemblance:

I don't remember the image you show. I looked it up, I don't see this header on the wayback machine. I see a map atop this post in 2009 and then not too long after it becomes the grey texture that stayed until LW 2.0. Where did you get your image from?

2Alex Flint2moDang, the images in this post are totally off. I have a script that converts a google doc to markdown, then I proofread the markdown, but the images don't show up in the editor, and it looks like my script is off. Will fix tomorrow. Update: fixed
Abstraction Talk

Yeah, we can have a try and see whether it ends up being worth publishing.

Abstraction Talk

Nice. I'll get a transcript made on and share it with you for edits.

2johnswentworth2moHeads up, there's a lot of use of visuals - drawing, gesturing at things, etc - so a useful transcript may take some work.
Testing The Natural Abstraction Hypothesis: Project Intro

Curated. This is a pretty compelling research line and seems to me like it has the potential to help us a great deal in understanding how to interface and understand and align machine intelligence systems. It's also the compilation of a bunch of good writing+work from you that I'd like to celebrate, and it's something of a mission statement for the ongoing work.

I generally love all the images and like the way it adds a bunch of prior ideas together.

Formal Inner Alignment, Prospectus

Curated. Solid attempt to formalize the core problem, and solid comment section from lots of people.

Agency in Conway’s Game of Life

I recall once seeing someone say with 99.9% probability that the sun would still rise 100 million years from now, citing information about the life-cycle of stars like our sun. Someone else pointed out that this was clearly wrong, that by default that sun would be taken apart for fuel on that time scale, by us or some AI, and that this was a lesson in people's predictions about the future being highly inaccurate. 

But also, "the thing that means there won't be a sun sometime soon" is one of the things I'm pointing to when talking about "general intelligence". This post reminded me of that.

AMA: Paul Christiano, alignment researcher


(If both parties are interested in that debate I’m more than happy to organize it in whatever medium and do any work like record+transcripts or book an in-person event space.)

AMA: Paul Christiano, alignment researcher

The stuff about ‘alien’ knowledge sounds really fascinating, and I’d be excited about write-ups. All my concrete intuitions here come from reading Distill.Pub papers.

AMA: Paul Christiano, alignment researcher

Huh, am surprised. Guess I might’ve predicted Boston. Curious if it’s because of the culture, the environment, or what.

3Paul Christiano3moDon't read too much into it. I do dislike Boston weather.
AMA: Paul Christiano, alignment researcher

Most people, or most people you know.

And “should“ = given their own goals.

I’m asking what you think people might be wrong about. And very slightly hoping for product recommendations :)

AMA: Paul Christiano, alignment researcher

I want to know this question, but for the ‘peak’ alignment researcher.

3Paul Christiano3moMy answer isn't sensitive to things like "how good are you at research" (I didn't even express the sensitivity to "how much do you like reflecting" or "how old are you" which I think are more important). I guess probably the first order thing is that the 'peak' alignment researcher is more likely to be older and closer to death so investing somewhat less in getting better at things. (But the world changes and lives are long so I'm not sure it's a huge deal.)
AMA: Paul Christiano, alignment researcher

If you could magically move most of the US rationality and x-risk and EA community to a city in the US that isn't the Bay, and you had to pick somewhere, where where would you move them to?

If I'm allowed to think about it first then I'd do that. If I'm not, then I'd regret never having thought about it, probably Seattle would be my best guess.

AMA: Paul Christiano, alignment researcher

And on an absolute level, is the world much more or less prepared for AGI than it was 15 years ago? 

Follow-up: How much did the broader x-risk community change it at all?

4Paul Christiano3moI think much better. I don't really know / tough to answer. Certainly there's a lot more people talking about the problem, it's hard to know how much that comes from x-risk community or from vague concerns about AI in the world (my guess is big parts of both). I think we are in a better place with respect to knowledge of technical alignment---we know a fair bit about what the possible approaches are and have taken a lot of positive steps. There is a counterfactual where alignment isn't even really recognized as a distinct problem and is just lumped in with vague concerns about safety, which would be significantly worse in terms of our ability to work productively on the problem (though I'd love if we were further away from that world).
AMA: Paul Christiano, alignment researcher

Why did nobody in the world run challenge trials for the covid vaccine and save us a year of economic damage?

Wild speculation, not an expert. I'd love to hear from anyone who actually knows what's going on.

I think it's overoptimistic that human challenge trials would save a year, though it does seem like they could have plausibly have saved weeks or months if done in the most effective form. (And in combination with other human trials and moderate additional spending I'd definitely believe 6-12 months of acceleration was possible.)

In terms of why so few human experiments have happened in general, I think it's largely because of strong norms designed to protect ex... (read more)

AMA: Paul Christiano, alignment researcher

Which rationalist virtue do you identify with the strongest currently? Which one would you like to get stronger at?

AMA: Paul Christiano, alignment researcher

Paul, if you did an episode of AXRP, which two other AXRP episodes do you expect your podcast would be between, in terms of quality? For this question, collapse all aspects of quality into a scalar.

AMA: Paul Christiano, alignment researcher

Do you have any specific plans for your life in a post-singularity world?

Not really.

I expect that many humans will continue to participate in a process of collectively clarifying what we want and how to govern the universe. I wouldn't be surprised if that involves a lot of life-kind-of-like-normal that gradually improves in a cautious way we endorse rather than some kind of table-flip (e.g. I would honestly not be surprised if post-singularity we still end up raising another generation because there's no other form of "delegation" that we feel more confident about). And of course in such a world I expect to just continue to spe... (read more)

AMA: Paul Christiano, alignment researcher

What were your main updates from the past few months?

6Paul Christiano3moLots of in-the-weeds updates about theory, maybe most interestingly that "tell me what I want to hear" models are a large fraction of long-term (i.e. not-resolved-with-scale-and-diversity) generalization problems than I'd been imagining. I've increased my probability on fast takeoff in the sense of successive doublings being 4-8x faster instead of 2x faster, by taking more seriously the possibility "if you didn't hit diminishing-marginal-returns in areas like solar panels, robotics, and software, current trends would actually imply faster-than-industrial-revolution takeoff even without AI weirdness." That's not really a bayesian update, just a change in beliefs.
AMA: Paul Christiano, alignment researcher

Who is right between Eliezer and Robin in the AI FOOM debate?

I mostly found myself more agreeing with Robin, in that e.g. I believe previous technical change is mostly a good reference class, that Eliezer's AI-specific arguments are mostly kind of weak. (I liked the image, I think from that debate, of a blacksmith emerging into the townsquare with his mighty industry and making all bow before them.)

That said, I think Robin's quantitative estimates/forecasts are pretty off and usually not very justified, and I think he puts too much stock on an outside view extrapolation from past transitions rather than looking at t... (read more)

AMA: Paul Christiano, alignment researcher

What should people be spending more money on?

2Paul Christiano3moWhich people? (And whose "should"?) Maybe public goods, software, and movies?
AMA: Paul Christiano, alignment researcher

What important truth do very few people in your community/network agree with you on?

Unfortunately (fortunately?) I don't feel like I have access to any secret truths. Most idiosyncratic things I believe are pretty tentative, and I hang out with a lot of folks who are pretty open to the kinds of weird ideas that might have ended up feeling like Paul-specific secret truths if I hung with a more normal crowd. 

It feels like my biggest disagreement with people around me is something like: to what extent is it likely to be possible to develop an algorithm that really looks on paper like it should just work for aligning powerful ML systems.... (read more)

AMA: Paul Christiano, alignment researcher

Let me ask the question Daniel Filan is too polite to ask: would you like to be interviewed on your research for an episode of the AXRP podcast? 

That's not the AXRP question I'm too polite to ask.

AMA: Paul Christiano, alignment researcher

What is the main mistake you've made in your research, that you were wrong about?

Positive framing: what's been the biggest learning moment in the course of your work?

Basically every time I've shied away from a solution because it feels like cheating, or like it doesn't count / address the real spirit of the problem, I've regretted it. Often it turns out it really doesn't count, but knowing exactly why (and working on the problem with no holds barred) had been really important for me.

The most important case was dismissing imitation learning back in 2012-2014, together with basically giving up outright on all ML approaches, which I only recognized as a problem when I was writing up why those approaches were doomed more carefully and why imitation learning was a non-solution.

AMA: Paul Christiano, alignment researcher

What work are you most proud of? 

Slightly different: what blog post are you most proud of?

I don't have an easy way of slicing my work up / think that it depends on how you slice it. Broadly I think the two candidates are (i) making RL from human feedback more practical and getting people excited about it at OpenAI, (ii) the theoretical sequence from approval-directed agents and informed oversight to iterated amplification to getting a clear picture of the limits of iterated amplification and setting out on my current research project. Some steps of that were really hard for me at the time though basically all of them now feel obvious.

My favorit... (read more)

AMA: Paul Christiano, alignment researcher

Who's the best critic of your alignment research? What have they been right about?

AMA: Paul Christiano, alignment researcher

What was your biggest update about the world from living through the coronavirus pandemic?

Follow-up: does it change any of your feelings about how civilization will handle AGI?

I found our COVID response pretty "par for the course" in terms of how well we handle novel challenges. That was a significant negative update for me because I had a moderate probability on us collectively pulling out some more exceptional adaptiveness/competence when an issue was imposing massive economic costs and had a bunch of people's attention on it. I now have somewhat more probability on AI dooms that play out slowly where everyone is watching and yelling loudly about it but it's just really tough to do something that really improves the situation (and correspondingly more total probability on doom). I haven't really sat down and processed this update or reflected on exactly how big it should be.

AMA: Paul Christiano, alignment researcher

What's a direction you'd like to see the rationality community grow stronger in over the coming 5-10 years?

More true beliefs (including especially about large numbers of messy details rather than a few central claims that can receive a lot of attention).

AMA: Paul Christiano, alignment researcher

Do you know what sorts of people you're looking to hire? How much do you expect ARC to grow over the coming years, and what will the employees be doing? I can imagine it being a fairly small group of like 3 researchers and a few understudies, I can also imagine it growing to 30 people like MIRI. Which one of these is it closer to?

I'd like to hire a few people (maybe 2 researchers median?) in 2021. I think my default "things are going pretty well" story involves doubling something like every 1-2 years for a while. Where that caps out / slows down a lot depends on how the field shapes out and how broad our activities are. I would be surprised if I wanted to stop growing at <10 people just based on the stuff I really know I want to do.

The very first hires will probably be people who want to work on the kind of theory I do, since right now that's what I'm feeling most excited about ... (read more)

AMA: Paul Christiano, alignment researcher

What are the main ways you've become stronger and smarter over the past 5 years? This isn't a question about new object-level beliefs so much as ways-of-thinking or approaches to the world that have changed for you.

3Paul Christiano3moI'm changing a lot less with every successive 5-year interval. The last 5 years was the end of grad school and my time at OpenAI. I certainly learned a lot about how to make ML work in practice (start small, prioritize simple cases where you can debug, isolate assumptions). Then I learned a lot about how to run a team. I've gotten better at talking to people and writing and being a broadly functional (making up on some lost time when I was younger and focused on math instead). I don't think there's any simple slogan for new ways-of-thinking or changed approaches to the world. Mostly just seems like a ton of little stuff. I think earlier phases of my life were more likely to be a shift in an easily described direction, but this time it's been more a messy mix---I became more arrogant in some ways and more humble in others, more optimistic in some ways and more pessimistic in others, more inclined to trust on-paper reasoning in some ways and less in others, etc
AMA: Paul Christiano, alignment researcher

I'm not interested in the strongest argument from your perspective (i.e. the steelman), but I am interested how much you think you can pass the ITT for Eliezer's perspective on the alignment problem — what shape the problem is, why it's hard, and how to make progress. Can you give a sense of the parts of his ITT you think you've got?

I think I could do pretty well (it's plausible to me that I'm the favorite in any head-to-head match with someone who isn't a current MIRI employee? probably not but I'm at least close). There are definitely some places I still get surprised and don't expect to do that well, e.g. I was recently surprised by one of Eliezer's positions regarding the relative difficulty of some kinds of reasoning tasks for near-future language models (and I expect there are similar surprises in domains that are less close to near-term predictions). I don't really know how to split it into parts for the purpose of saying what I've got or not.

AMA: Paul Christiano, alignment researcher

What is your top feature request for

4Paul Christiano3moWhen I begin a comment with a quotation, I don't know how to insert new un-quoted text at the top (other than by cutting the quotation, adding some blank lines, then pasting the quotation back). That would be great. Also moderate performance improvements. And then maybe a better feed that gives me the content I'm most likely to see? That's a tough thing to design but could add significant value.
4Paul Christiano3mo...And I show you how deep the rabbit hole goes [] Maybe Guided by the Beauty of our Weapons [] if fiction doesn't count. (I expect I'd think of a better post than this one if I thought longer, but not a better post than the black pill story.)
AMA: Paul Christiano, alignment researcher

Other than by doing your own research, from where or whom do you tend to get valuable research insights?

AMA: Paul Christiano, alignment researcher

What works of fiction / literature have had the strongest impact on you? Or perhaps, that are responsible for the biggest difference in your vector relative to everyone else's vector?

(e.g. lots of people were substantially impacted by the Lord of the Rings, but perhaps something else had a big impact on you that led you in a different direction from all those people)

(that said, LotR is a fine answer)

AMA: Paul Christiano, alignment researcher

Did you get much from reading the sequences? What was one of the things you found most interesting or valuable personally it them?

I enjoyed Leave a Line of Retreat. It's a very concrete and simple procedure that I actually still use pretty often and I've benefited a lot just from knowing about. Other than that I think I found a bunch of the posts interesting and entertaining. (Looking back now the post is a bit bombastic, I suspect all the sequences are, but I don't really mind.)

Announcing the Alignment Research Center

You're gonna get back to thesis writing quickly, it's a very short form.

3Adam Shimi3moThis is so great! I always hate wishing people luck when I trust in their competence to mostly deal with bad luck and leverage good luck. I'll use that one now.
Coherence arguments imply a force for goal-directed behavior

Curated. Felt to me like a valuable step in this conversation, and analyzed some details helpfully to me. Thanks for writing it.

Probability theory and logical induction as lenses

Great post, I’m glad this is written up nicely.

One section was especially interesting to me:

If the credences you assign to your beliefs obey the logical induction criterion, then you will get such-and-such benefits.

In the case of logical induction, the benefits are things like coherence, convergence, timeliness, and unbiasedness[2]. But different from probability theory, these concepts are operationalized as properties of the evolution of your credences over time, rather than as properties of your credences at any particular point in time.

Emphasis added.

I ... (read more)

2Alex Flint3moYeah, I agree, logical induction bakes in the concept of time in a way that probability theory does not. And yeah, it does seem necessary, and I find it very interesting when I squint at it.
Where are intentions to be found?

This reminds me that it's hard for me to say where "I" am, in both space and time.

I read a story recently (which I'm going to butcher because I don't remember the URL), about a great scientist who pulled a joke: after he died, his wife had a seance or used a ouija board or something, which told her to look at the first sentence of the 50th page of his book, and the first sentence was "<The author> loved to find creative ways to communicate with people."

After people die, their belongings and home often contain an essence of 'them'. I think that some p... (read more)

2Alex Flint3moYes, I agree. I once stayed in Andrew Critch's room for a few weeks while he was out of town. I felt that I was learning from him in his absence because he had all these systems and tools and ways that things were organized. I described it at the time as "living inside Critch's brain for two weeks", which was a great experience. Thanks Critch!
Another (outer) alignment failure story

Curated. This was perhaps the most detailed yet informative story I've read about how failure will go down. As you say at the start it's making several key assumptions, it's not your 'mainline' failure story. Thx for making the assumptions explicit, and discussing how to vary them at the end. I'd like to see more people write stories written under different assumptions.

The sorts of stories Eliezer has told in the past have focused on 10-1000x faster takeoffs than discussed here, so those stories are less extended (you kinda just wake up one day then everyo... (read more)

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

At this point, my plan is try to consolidate what I think the are main confusions in the comments of this post, into one or more new concepts to form the topic of a new post.

Sounds great! I was thinking myself about setting aside some time to write a summary of this comment section (as I see it).

My Current Take on Counterfactuals

I've felt like the problem of counterfactuals is "mostly settled" for about a year, but I don't think I've really communicated this online.

Wow that's exciting! Very interesting that you think that.

3Abram Demski3moNow I feel like I should have phrased it more modestly, since it's really "settled modulo math working out", even though I feel fairly confident some version of the math should work out.
Reflective Bayesianism

The rules say we must use consequentialism, but good people are deontologists, and virtue ethics is what actually works.

—Eliezer Yudkowsky, Twitter

Load More