The Best of LessWrong — AI Alignment Forum

AI ALIGNMENT FORUM
AF

20182019202020212022202320242025All

RationalityWorldOptimizationAI StrategyTechnical AI SafetyPracticalAll

Historically people worried about extinction risk from artificial intelligence have not seriously considered deliberately slowing down AI progress as a solution. Katja Grace argues this strategy should be considered more seriously, and that common objections to it are incorrect or exaggerated.

Six Dimensions of Operational Adequacy in AGI Projects

A "good project" in AGI research needs:1) Trustworthy command, 2) Research closure, 3) Strong operational security, 4) Commitment to the common good, 5) An alignment mindset, and 6) Requisite resource levels.

The post goes into detail on what minimal, adequate, and good performance looks like.

It Looks Like You're Trying To Take Over The World

A fictional story about an AI researcher who leaves an experiment running overnight.

#10

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

"Human feedback on diverse tasks" could lead to transformative AI, while requiring little innovation on current techniques. But it seems likely that the natural course of this path leads to full blown AI takeover.

#18

Counterarguments to the basic AI x-risk case

Katja Grace provides a list of counterarguments to the basic case for existential risk from superhuman AI systems. She examines potential gaps in arguments about AI goal-directedness, AI goals being harmful, and AI superiority over humans. While she sees these as serious concerns, she doesn't find the case for overwhelming likelihood of existential risk convincing based on current arguments.

#20

Safetywashing

The field of AI alignment is growing rapidly, attracting more resources and mindshare each year. As it grows, more people will be incentivized to misleadingly portray themselves or their projects as more alignment-friendly than they are. Adam proposes "safetywashing" as the term for this

10habryka

I think it's a bit hard to tell how influential this post has been, though my best guess is "very". It's clear that sometime around when this post was published there was a pretty large shift in the strategies that I and a lot of other people pursued, with "slowing down AI" becoming a much more common goal for people to pursue. I think (most of) the arguments in this post are good. I also think that when I read an initial draft of this post (around 1.5 years ago or so), and had a very hesitant reaction to the core strategy it proposes, that I was picking up on something important, and that I do also want to award Bayes points to that part of me given how things have been playing out so far. I do think that since I've seen people around me adopt strategies to slow down AI, I've seen it done on a basis that feels much more rhetorical, and often directly violates virtues and perspectives that I hold very dearly. I think it's really important to understand that technological progress has been the central driving force behind humanity's success, and that indeed this should establish a huge prior against stopping almost any kind of technological development. In contrast to that, the majority of arguments that I've seen find traction for slowing down AI development are not distinguishable from arguments that apply to a much larger set of technologies which to me clearly do not pose a risk commensurable with the prior we should have against slowdown. Concerns about putting people out of jobs, destroying traditional models of romantic relationships, violating copyright law, spreading disinformation, all seem to me to be the kind of thing that if you buy it, you end up with an argument that proves too much and should end up opposed to a huge chunk of technological progress. And I can feel the pressure in myself for these things as well. I can see how it would be easy to at least locally gain traction at slowing down AI by allying myself with people who are concerned abo