Mitchell_Porter — AI Alignment Forum

Do you have an endgame strategy ready?

Once you have minds those minds start perceiving differentiation since they need to extract information from the environment to function.

How can there be information for minds to extract, unless the environment already has some kind of structure?

The Thingness of Things

Mitchell_Porter3y10

I have a theory that belief in a good God is the main delusion of western religion, and belief in a fundamentally undifferentiated reality is the main delusion of eastern religion.

I see no way around the conclusion that differences are real. Experience is part of reality, and experience contains difference. Also, my experience is objectively distinct from yours - I don't know what you had for breakfast today (or indeed if you had any); that act was part of your experience, and not part of mine.

We can divide up the world in different ways, but the undivided world is already objectively differentiated.

The Thingness of Things

Mitchell_Porter3y00

How can self-observation be the cause of my existence as a differentiated being? Don't I have to already exist as a differentiated being, in order to be doing that?

The Thingness of Things

Mitchell_Porter3y11

Are you saying my existence is "undifferentiated" from "the wholeness of the world" so long as no one else is observing me or thinking of me?

The Thingness of Things

Mitchell_Porter3y31

there are only phenomena

Do I only exist because you "reify" me?

Six Dimensions of Operational Adequacy in AGI Projects

Mitchell_Porter4y00

The "alignment problem" humanity has as its urgent task is exactly the problem of aligning cognitive work that can be leveraged to prevent the proliferation of tech that destroys the world. Once you solve that, humanity can afford to take as much time as it needs to solve everything else.

OK, I disagree very much with that strategy. You're basically saying, your aim is not to design ethical/friendly/aligned AI, you're saying your aim is to design AI that can take over the world without killing anyone. Then once that is accomplished, you'll settle down to figure out how that unlimited power would best be used.

To put it another way: Your optimistic scenario is one in which the organization that first achieves AGI uses it to take over the world, install a benevolent interim regime that monopolizes access to AGI without itself making a deadly mistake, and which then eventually figures out how to implement CEV (for example); and then it's finally safe to have autonomous AGI.

I have a different optimistic scenario: We definitively figure out the theory of how to implement CEV before AGI even arises, and then spread that knowledge widely, so that whoever it is in the world that first achieves AGI, they will already know what they should do with it.

Both these scenarios are utopian in different ways. The first one says that flawed humans can directly wield superintelligence for a protracted period without screwing things up. The second one says that flawed humans can fully figure out how to safely wield superintelligence before it even arrives.

Meanwhile, in reality, we've already proceeded an unknown distance up the curve towards superintelligence, but none of the organizations leading the way has much of a plan for what happens, if their creations escape their control.

In this situation, I say that people whose aim is to create ethical/friendly/aligned superintelligence, should focus on solving that problem. Leave the techno-military strategizing to the national security elites of the world. It's not a topic that you can avoid completely, but in the end it's not your job to figure out how mere humans can safely and humanely wield superhuman power. It's your job to design an autonomous superhuman power that is intrinsically safe and humane. To that end we have CEV, we have June Ku's work, and more. We should be focusing there, while remaining engaged with the developments in mainstream AI, like language models. That's my manifesto.

Six Dimensions of Operational Adequacy in AGI Projects

Mitchell_Porter4y00

The "stable period" is supposed to be a period in which AGI already exists, but nothing like CEV has yet been implemented, and yet "no one can destroy the world with AGI". How would that work? How do you prevent everyone in the whole wide world from developing unsafe AGI during the stable period?

Six Dimensions of Operational Adequacy in AGI Projects

Mitchell_Porter4y00

Thank you for the long reply. The 2017 document postulates an "acute risk period" in which people don't know how to align, and then a "stable period" once alignment theory is mature.

So if I'm getting the gist of things, rather than focus outright on the creation of a human-friendly superhuman AI, MIRI decided to focus on developing a more general theory and practice of alignment; and then once alignment theory is sufficiently mature and correct, one can focus on applying that theory to the specific crucial case, of aligning superhuman AI with extrapolated human volition.

But what's happened is that we're racing towards superhuman AI while the general theory of alignment is still crude, and this is a failure for the strategy of prioritizing general theory of alignment over the specific task of CEV.

Is that vaguely what happened?

Six Dimensions of Operational Adequacy in AGI Projects

Mitchell_Porter4y00

Eliezer and Nate feel that their past alignment research efforts failed

I find this a little surprising. If someone had asked me what MIRI's strategy is, I would have said that the core of it was still something like CEV, with topics like logical induction and new decision theory paradigms as technical framework issues. I mean, part of the MIRI paradigm has always been that AGI alignment is grounded in how the human brain works, right? The mechanics of decision-making in human brains, are the starting point in constructing the mechanics of decision-making in an AGI that humans would call 'aligned'. And I would have thought that identifying how to do this, was still just research in progress in many directions, rather than something that had hit a dead end.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments