Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.

Longer bio: www.lesswrong.com/posts/aG74jJkiPccqdkK3c/the-lesswrong-team-page-under-construction#Ben_Pace___Benito

Sequences

AI Alignment Writing Day 2019
AI Alignment Writing Day 2018

Comments

AI Research Considerations for Human Existential Safety (ARCHES)

I listened to this yesterday! Was quite interesting, I'm glad I listened to it.

Draft report on AI timelines

I expect the examples Ajeya has in mind are more like sharing one-line summaries in places that tend to be positively selected for virality and anti-selected for nuance (like tweets), but that substantive engagement by individuals here or in longer posts will be much appreciated.

Radical Probabilism

Thank you, they were all helpful. I'll write more if I have more questions.

("sadly that's unprobable to work" lol)

Radical Probabilism

Thank you, those points all helped a bunch. 

(I feel most resolved on the calibration one. If I think more about the other two and have more questions, I'll come back and write them.)

Radical Probabilism

I made notes while reading about things that I was confused about or that stood out to me. Here they are:

  • The post says that radical probabilism rejects #3-#5, but also that Jeffrey's updates is derived from having rigidity (#5), which sounds like a contradiction. (I feel most dumb about this bullet, it's probably obvious.)
  • The convergence section blew me away. The dialogue here correctly understood my confusion (why would I only believe either h(1/3) or h(2/3)) and then hit me with the 'embedded world models' point, and that was so persuasive. This felt really powerful, tying together some key arguments in this space.
  • I don’t get why the proof of conservation of expected evidence is relevant. It seems to assume that not only do I know how I will update, but that the bookie does too, which seems like an odd and overpowered assumption, and feels in contrast with all the things you said about rigidity – why does the bookie get to know how I’ll update?
  • "This has some implications for AI alignment, but I won't try to spell them out here." Such temptation! :)
  • I didn’t follow the argument that classical bayesians don’t have calibration. I think it's just saying that classical bayesianism doesn't have any part for self-reference, and that's a big deal? I don't think this means bayesians aren't calibrated, just that they don't have calibration as an explicit part of their model.
  • I do not understand how Jeffrey updates lead to path dependence. Is the trick that my probabilities can change without evidence, therefore I can just update B without observing anything that also updates A, and then use that for hocus pocus? Writing that out, I think that's probably it, but as I was reading the essay I wasn't sure which bit was where the key step was happening.
  • Okay, I got tired and skipped most of the virtual evidence section (it got tough for me). You say "Exchange Virtual Evidence" and I would be interested in a concrete example of what that kind of conversation would look like. I'm imagining it's something like "I thought for ages and changed my mind, let me tell you why".
  • Thanks for the the stuff at the end, about making the meta-bayesian update. I wanted to read you say your thoughts on that, would've been sad if it hadn't been there.
  • The examples of non-bayesian updates I've been making are really valuable. I'll be noticing these more often.
Radical Probabilism

Sh*t. Wow. This is really impressive. 

Speaking for myself, this (combined with your orthodox case against utility functions) feels like the next biggest step for me since Embedded Agency in understanding what's wrong with our models of agency and how to improve them.

If I were to put it into words, I'm getting a strong vibe of "No really, you're starting the game inside the universe, stop assuming you've got all the hypotheses in your head and that you've got clean input-output, you need far fewer assumptions if you're going to get around this space at all." Plus a sense that this isn't 'weird' or 'impossibly confusing', and that actually these things will be able to make good sense.

All the details though are in the things you say about convergence and not knowing your updates and so on, which I don't have anything to add to.

Forecasting Thread: AI Timelines

(I can't see your distribution in your image.)

Forecasting Thread: AI Timelines

For example, a main consideration of my prediction is using the heurastic With 50% probability, things will last twice as long as they already have, with the starting time of 1956, the time of the Dartmouth College summer AI conference.
 

A counter hypothesis I’ve heard (not original to me) is: With 50% probability, we will be half-way through the AI researcher-years required to get AGI.

I think this suggests much shorter timelines, as most researchers have been doing research in the last ~10 years.

It's not clear to me what reference class makes sense here though. Like, I feel like 50% doesn’t make any sense. It implies that for all outstanding AI problems we’re fifty percent there. We’re 50% of the way to a rat brain, to a human emulation, to a vastly superintelligent AGI, etc. It’s not a clearly natural category for a field to be “done”, and it’s not clear which thing counts as ”done” in this particular field.

Forecasting Thread: AI Timelines

Comment here if you have technical issues with the Elicit tool, with putting images in your comments, or with anything else.

Forecasting Thread: AI Timelines

Here's my quick forecast, to get things going. Probably if anyone asks me questions about it I'll realise I'm embarrassed by it and change it.

Link.

It has three buckets:

10%: We get to AGI with the current paradigm relatively quickly without major bumps.

60%: We get to it eventually sometime in the next ~50 years.

30%: We manage to move into a stable state where nobody can unilaterally build an AGI, then we focus on alignment for as long as it takes before we build it.

2nd attempt

Adele Lopez is right that 30% is super optimistic. Also I accidentally put a bunch within '2080-2100', instead of 'after 2100'. And also I thought about it more. here's my new one.

My distribution is the fat blue one.

Link.

It has four buckets:

20% Current work leads directly into AI in the next 15 years.

55% There are some major bottlenecks, new insights needed, and some engineering projects comparable in size to the manhattan project. This is 2035 to 2070.

10% This is to fill out 2070 to 2100.

15% We manage to move to a stable state, or alternatively civilizational collapse / non-AI x-risk stops AI research. This is beyond 2100.

Load More