Raymond Arnold

I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.


AGI safety from first principles: Introduction

I haven't had time to reread this sequence in depth, but I wanted to at least touch on how I'd evaluate it. It seems to be aiming to be both a good introductory sequence, while being a "complete and compelling case I can for why the development of AGI might pose an existential threat".

The question is who is this sequence for,  what is it's goal, and how does it compare to other writing targeting similar demographics. 

Some writing that comes to mind to compare/contrast it with includes:

  • Scott Alexander's Superintelligence FAQ. This is the post I've found most helpful for convincing people (including myself), that yes, AI is just actually a big deal and an extinction risk. It's 8000 words. It's written fairly entertainingly. What I find particularly compelling here are a bunch of factual statements about recent AI advances that I hadn't known about at the time.
  • Tim Urban's Road To Superintelligence series. This is even more optimized for entertainingness. I recall it being a bit more handwavy and making some claims that were either objectionable, or at least felt more objectionable. It's 22,000 words.
  • Alex Flint's AI Risk for Epistemic Minimalists. This goes in a pretty different direction – not entertaining, and not really comprehensive either . It came to mind because it's doing a sort-of-similar thing of "remove as many prerequisites or assumptions as possible". (I'm not actually sure it's that helpful, the specific assumptions it's avoiding making don't feel like issues I expect to come up for most people, and then it doesn't make a very strong claim about what to do)

(I recall Scott Alexander once trying to run a pseudo-study where he had people read a randomized intro post on AI alignment, I think including his own Superintelligence FAQ and Tim Urban's posts among others, and see how it changed people's minds. I vaguely recall it didn't find that big a difference between them. I'd be curious how this compared)

At a glance, AGI Safety From First Principles seems to be more complete than Alex Flint's piece, and more serious/a-bit-academic than Scott or Tim's writing. I assume it's aiming for a somewhat skeptical researcher, and is meant to not only convince them the problem exists, but give them some technical hooks of how to start thinking about it. I'm curious how well it actually succeeds at that.

Inaccessible information

It strikes me that this post looks like a (AFAICT?) a stepping stone towards the Eliciting Latent Knowledge research agenda, which currently has a lot of support/traction. Which makes this post fairly historically important.

Some AI research areas and their relevance to existential safety

I've highly voted this post for a few reasons. 

First, this post contains a bunch of other individual ideas I've found quite helpful for orienting. Some examples:

  • Useful thoughts on which term definitions have "staying power," and are worth coordinating around.
  • The zero/single/multi alignment framework.
  • The details on how to anticipate legitimize and fulfill governance demands.

But my primary reason was learning Critch's views on what research fields are promising, and how they fit into his worldview. I'm not sure if I agree with Critch, but I think "Figure out what are the best research directions to navigate towards" seems crucially important. Having senior senior AI x-risk researchers to lay out how they think about what research is valuable. 

I'd like to see similar posts from Paul, Eliezer, etc, (which I expect to have radically different frames). I don't expect everyone to end up converging on a single worldview, but I think the process of smashing the worldviews together can generate useful ideas, and give up-and-coming-researchers some hooks of what to explore.

One confusing here is that the initial table doesn't distinguish between "fields that aren't that helpful for existential safety" and "fields which are both helpful-and-harmful to existential safety." I was surprised when I looked at the initial Agent Foundations ranking of "3" which turned out to be much more complex.

Some notes on worldview differences this post highlights.

disclaimer: my own rough guesses about Critch's and MIRIs views, which may not be accurate. It's also focusing on the differences that felt important to me, which I think are somewhat different from how Critch presents things. I'm also using "MIRI" as sort of a shorthand for "some cluster of thinking that's common on LW", which isn't necessaril

My understanding of Critch's paradigm seems fairly different from the MIRI paradigm (which AFAICT expects the first AGI mover will gain overwhelming decisive advantage, and meanwhile that interfacing with most existing power structures is... kinda a waste of time (due to them being trapped in bad equilibria that make them inadequate?).

From what I understand of Critch's view, AGI will tend to be rolled out in smaller, less-initially-powerful pieces, and much of the danger of AGI comes from when many different AGIs start interacting with each other, and multiple humans, in ways that get increasingly hard to predict. 

Therefore, it's important for humanity as a whole to be able to think critically and govern themselves in scalable ways. I think Critch thinks it is both more tractable to get humanity to collectively govern itself, and also thinks it's more important, which leads to more emphasis on domains like ML Fairness.

Some followup work I'd really like to see are more public discussions about the underlying worldview differences here, and the actual cruxes that generate them.

Speaking for myself (as opposed to either Critch or MIRI-esque researchers), "whether our institutions are capable of governing themselves in the face of powerful AI systems" is an important crux for what strategic directions to prioritize. BUT, I've found all the gears that Critch has pointed to here to be helpful for my overall modeling of the world. 

AGI safety from first principles: Introduction

A year later, as we consider this for the 2020 Review, I think figuring out a better name is worth another look.

Another option is "AI Catastrophe from First Principles"

EfficientZero: How It Works

Curated. EfficientZero seems like an important advance, and I appreciate this post's length explanation, broken into sections that made it easy to skim past parts I already understood.

How To Get Into Independent Research On Alignment/Agency

Curated. This post matched my own models of how folk tend to get into independent alignment research, and I've seen some people whose models I trust more endorse the post as well. Scaling good independent alignment research seems very important.

I do like that the post also specifies who shouldn't be going to independent research.

Yudkowsky and Christiano discuss "Takeoff Speeds"

So... I totally think there are people who sort of nod along with Paul, using it as an excuse to believe in a rosier world where things are more comprehensible and they can imagine themselves doing useful things without having a plan for solving the actual hard problems. Those types of people exist. I think there's some important work to be done in confronting them with the hard problem at hand.

But, also... Paul's world AFAICT isn't actually rosier. It's potentially more frightening to me. In Smooth Takeoff world, you can't carefully plan your pivotal act with an assumption that the strategic landscape will remain roughly the same by the time you're able to execute on it. Surprising partial-gameboard-changing things could happen that affect what sort of actions are tractable. Also, dumb, boring ML systems run amok could kill everyone before we even get to the part where recursive self improving consequentialists eradicate everyone. 

I think there is still something seductive about this world – dumb, boring ML systems run amok feels like the sort of problem that is easier to reason about and maybe solve. (I don't think it's actually necessarily easier to solve, but I think it can feel that way, whether it's easier or not). And if you solve ML-run-amok-problems, you still end up dead from recursive-self-improving-consequentialists if you didn't have a plan for them.

But, that seductiveness feels like a different problem to me than what's getting argued about in this dialog. (This post seemed to mostly be arguing on the object level at Paul. I recall a previous Eliezer comment where he complained that Paul kept describing things in language that were easy to round off to "things are easy to deal with" even though Eliezer knew that Paul didn't believe that. That feels more like what the argument here was actually about, but the way the conversation was conducted didn't seem to acknowledge that.)

My current take some object level points in this post:

  • It (probably) matters what the strategic landscape looks like in the years leading up to AGI.
  • It might not matter if you have a plan for pivotal acts that you're confident are resilient against the sort of random surprises that might happen in Smooth Takeoff World.
  • A few hypotheses that are foregrounded by this post include:
    • Smooth Takeoff World, as measured in GDP.
      • GDP mostly doesn't seem like it matters except as a proxy, so I'm not that hung up on evaluating this. (That said, the "Bureaucracy and Thielian Secrets" model is interesting, and does provoke some interesting thoughts on how the world might be shaped)
    • Smooth Takeoff World, as measured by "AI-breakthroughs-per-year-or-something".
      • This feels like something that might potentially matter. I agree that AI-breakthroughs-per-year is hard to operationalize, but if AI is able to feed back into AI research that seems strategically relevant. I'm surprised/confused that Eliezer wasn't more interested in exploring this.
    • Abrupt Fast Takeoff World, which mostly like this one except suddenly someone has a decisive advantage and/or we're all dead.
    • Chunky Takeoff World. Mostly listed for completeness. Maybe there won't be a smooth hyperbolic curve all the way to FOOM, there might be a few discrete advances in between here and there.
  • Eliezer's arguments against Smooth-Takeoff-World generally don't feel as ironclad to me as the arguments about FOOM. AFAICT he also only specified arguments in this post against Smooth-Takeoff-Measured-By GDP. It seems possible that, i.e. Deepmind could start making AI advances that they use fully internally without running them through external bureaucracy bottlenecks. It's possible that any sufficiently large organization develops it's own internal bureaucracy bottlenecks, but also totally possible that all the smartest people at DeepMind talk to each other and the real work gets done in a way that cuts through it
  • The "Bureaucracy Bottleneck as crux against Smooth Takeoff GDP World" was quite interesting for general worldmodeling, whether or not it's strategically relevant. It does suggest it might be quite bad if the AI ecosystem figured out how to bypass it's own bureaucracy bottlenecks.
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Update: I originally posted this question over here, then realized this post existed and maybe I should just post the question here. But then it turned out people had already started answering my question-post, so, I am declaring that the canonical place to answer the question.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Can someone give a rough explanation of how this compares to the recent Deepmind atari-playing AI:


And, for that matter, how both of them compare to the older deepmind paper:


Are they accomplishing qualitatively different things? The same thing but better?

AMA: Paul Christiano, alignment researcher

Curated. I don't think we've curated an AMA before, and not sure if I have a principled opinion on doing that, but this post seems chock full of small useful incites, and fragments of ideas that seem like they might otherwise take awhile to get written up more comprehensively, which I think is good.

Load More