Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.

Longer bio:


AI Alignment Writing Day 2019
AI Alignment Writing Day 2018

Wiki Contributions

Load More


[$20K in Prizes] AI Safety Arguments Competition

I think it would be less "off-putting" if we had common knowledge of it being such a post. I think the authors don't think of it as that from reading Sidney's comment.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).

For the record this does seem like the cruxy part of the whole discussion, and I think more concrete descriptions of alternatives would help assuage my concerns here.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.

This all seems like it would be good news. For the record I think that the necessary evidence to start acting has been around for decades if not longer (humans, evolution, computers, etc) and I don’t bet on a future such turning-point (there is no fire alarm). Would be happy to see a credible argument to the contrary.

Also all the cool shit you can do with AI feels like it will apply orders of magnitude more pressure on the “economic forces are pushing our civilization to make more” side than the “oh some weird chance this shit FOOMs and ends the world so let’s regulate this really well” side.

To be clear, I’m not arguing for leaving regulatory efforts entirely in the hands of governments with no help or advice or infrastructural contributions from the tech sector. I’m just saying that there are many viable options for regulating AI technology without requiring one company or lab to do all the work or even make all the judgment calls.

You think there are many viable options, I would be interested in hearing three.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

Does anyone in-thread (or reading along) have any experiments they'd be interested in me running with this air conditioner? It doesn't seem at all hard for me to do some science and get empirical data, with a different setup to Wirecutter, so let me know.

Added: From a skim of the thread, it seems to me the experiment that would resolve matters is testing in a large room with temperature sensors more like 15 feet away in a city or country that's very hot outside, and to compare this with (say) Wirecutter's top pick with two-hoses. Confirm?

Why Agent Foundations? An Overly Abstract Explanation

Curated. Laying out a full story for why the work you're doing is solving AI alignment is very helpful, and this framing captures different things from other framings (e.g. Rocket Alignment, Embedded Curiosities, etc). Also it's simply written and mercifully short, relative to other such things. Thanks for this step in the conversation.

ELK Thought Dump

Ah, very good point. How interesting…

(If I’d concretely thought of transferring knowledge between a bird and a dog this would have been obvious.)

ELK Thought Dump

Solomonoff's theory of induction, along with the AIXI theory of intelligence, operationalize knowledge as the ability to predict observations.

Maybe this is what knowledge is. But I’d like to try coming up with at least one alternative. So here goes!

I want to define knowledge as part of an agent.

  • A system contains knowledge if the agent who built it can successfully attain its goals in its likely environments by using that system to figure out which of its actions will lead to outcomes the agent wants.
  • When comparing different systems that allow an agent to achieve its goals, there is a Pareto frontier of how much knowledge is in the system, depending on how it helps in different environments.
  • A USB stick with blueprints for how to build nukes, in an otherwise lifeless universe, does not contain “knowledge”, because nobody ever “knows” it. (It has knowledge in it the way a tree’s rings do - it contains info that an agent like myself can turn into knowledge.)
  • I can store my knowledge outside of me. I can write your address down, forget it from my brain, pick it up later, and I still have that knowledge, stored on the paper in my office.

To find out how much knowledge Alice has, I can run her in lots of environments and see what she is able to accomplish by her standards.

Alice “knows” a certain amount about cars if she can use one to drive to the store to buy food. She knows more if she can use a broken car to do the same.

To compare Alice’s knowledge to Bob’s, I can give Alice Bob’s preferences, run Alice in lots of environments, and see what she is able to accomplish by Bob’s standards. 

To give Alice’s knowledge to a paperclip maximizer, I ask what a paperclip maximizer wants that Alice can help with. Perhaps Alice knows the location of a steel mine that Clippy doesn’t.

When she can outperform Clippy given the same resources, she knows something Clippy doesn‘t.

To train a system to “extract” knowledge from Alice and “give“ it to Clippy, I need to modify Clippy to do as well in those environments. Then Clippy will “know” what Alice knows.

How do I modify Clippy? I don’t know. So let’s first brute force the heck out of it. Make every physical possible alteration to Clippy, and run each in all possible environments. All those who do as well as Clippy in all environments, and also outperform Alice, have “learned” what Alice knows.

I’d bet there’s a more sensible algorithm to run, but I won’t reach for it now.


This was a fun question to answer.

I’m not sure what it would look like to have successfully answered the question, so I can’t tell if I did.

Oli was asking me how to get knowledge from one agent to another yesterday, and my first idea didn’t even have the right type signature, so I wanted to generate another proposal.

I’ll ponder what it would look like to succeed, then I can come back and grade my answer.

Late 2021 MIRI Conversations: AMA / Discussion

Eliezer, when you told Richard that your probability of a successful miracle is very low, you added the following note:

Though a lot of that is dominated, not by the probability of a positive miracle, but by the extent to which we seem unprepared to take advantage of it, and so would not be saved by one.

I don't mean to ask for positive fairy tales when I ask: could you list some things you could see in the world that would cause you to feel that we were well-prepared to take advantage of one if we got one?

My obvious quick guess would be "I know of an ML project that made a breakthrough as impressive as GPT-3 and this is secret to the outer world, and the organization is keenly interested in alignment". But I am also interested in broader and less obvious ones. For example if the folks around here had successfully made a covid vaccine I think that would likely require us to be in a much more competent and responsive situation. Alternatively if folks made other historic scientific breakthroughs guided by some model of how it helps prevent AI doom, I'd feel more like this power could be turned to relevant directions.

Anyway, these are some of the things I quickly generate, but I'm interested in what comes to your mind?

Late 2021 MIRI Conversations: AMA / Discussion

Eliezer and Nate, my guess is that most of your perspective on the alignment problem for the past several years has come from the thinking and explorations you've personally done, rather than reading work done by others.

But, if you have read interesting work by others that's changed your mind or given you helpful insights, what has it been? Some old CS textbook? Random Gwern articles? An economics textbook? Playing around yourself with ML systems?

Load More