The "alignment problem" humanity has as its urgent task is exactly the problem of aligning cognitive work that can be leveraged to prevent the proliferation of tech that destroys the world. Once you solve that, humanity can afford to take as much time as it needs to solve everything else.
OK, I disagree very much with that strategy. You're basically saying, your aim is not to design ethical/friendly/aligned AI, you're saying your aim is to design AI that can take over the world without killing anyone. Then once that is accomplished, you'll settle down to figure out how that unlimited power would best be used.
To put it another way: Your optimistic scenario is one in which the organization that first achieves AGI uses it to take over the world, install a benevolent interim regime that monopolizes access to AGI without itself making a deadly mistake, and which then eventually figures out how to implement CEV (for example); and then it's finally safe to have autonomous AGI.
I have a different optimistic scenario: We definitively figure out the theory of how to implement CEV before AGI even arises, and then spread that knowledge widely, so that whoever it is in the world that first achieves AGI, they will already know what they should do with it.
Both these scenarios are utopian in different ways. The first one says that flawed humans can directly wield superintelligence for a protracted period without screwing things up. The second one says that flawed humans can fully figure out how to safely wield superintelligence before it even arrives.
Meanwhile, in reality, we've already proceeded an unknown distance up the curve towards superintelligence, but none of the organizations leading the way has much of a plan for what happens, if their creations escape their control.
In this situation, I say that people whose aim is to create ethical/friendly/aligned superintelligence, should focus on solving that problem. Leave the techno-military strategizing to the national security elites of the world. It's not a topic that you can avoid completely, but in the end it's not your job to figure out how mere humans can safely and humanely wield superhuman power. It's your job to design an autonomous superhuman power that is intrinsically safe and humane. To that end we have CEV, we have June Ku's work, and more. We should be focusing there, while remaining engaged with the developments in mainstream AI, like language models. That's my manifesto.
The "stable period" is supposed to be a period in which AGI already exists, but nothing like CEV has yet been implemented, and yet "no one can destroy the world with AGI". How would that work? How do you prevent everyone in the whole wide world from developing unsafe AGI during the stable period?
Thank you for the long reply. The 2017 document postulates an "acute risk period" in which people don't know how to align, and then a "stable period" once alignment theory is mature.
So if I'm getting the gist of things, rather than focus outright on the creation of a human-friendly superhuman AI, MIRI decided to focus on developing a more general theory and practice of alignment; and then once alignment theory is sufficiently mature and correct, one can focus on applying that theory to the specific crucial case, of aligning superhuman AI with extrapolated human volition.
But what's happened is that we're racing towards superhuman AI while the general theory of alignment is still crude, and this is a failure for the strategy of prioritizing general theory of alignment over the specific task of CEV.
Is that vaguely what happened?
Eliezer and Nate feel that their past alignment research efforts failed
I find this a little surprising. If someone had asked me what MIRI's strategy is, I would have said that the core of it was still something like CEV, with topics like logical induction and new decision theory paradigms as technical framework issues. I mean, part of the MIRI paradigm has always been that AGI alignment is grounded in how the human brain works, right? The mechanics of decision-making in human brains, are the starting point in constructing the mechanics of decision-making in an AGI that humans would call 'aligned'. And I would have thought that identifying how to do this, was still just research in progress in many directions, rather than something that had hit a dead end.
Does that paper actually mention any overall models of the human mind? It has a list of ingredients, but does it say how they should be combined?
Seems like there's a difference between viability of AI, and ability of AI to shape a randomized environment. To have AI, you just need stable circuits, but to have an AI that can shape, you need a physics that allows observation and manipulation... It's remarkable that googling "thermodynamics of the game of life" turns up zero results.