Carl Feynman — AI Alignment Forum

I was born in 1962 (so I’m in my 60s). I was raised rationalist, more or less, before we had a name for it. I went to MIT, and have a bachelors degree in philosophy and linguistics, and a masters degree in electrical engineering and computer science. I got married in 1991, and have two kids. I live in the Boston area. I’ve worked as various kinds of engineer: electronics, computer architecture, optics, robotics, software.

Around 1992, I was delighted to discover the Extropians. I’ve enjoyed being in that kind of circles since then. My experience with the Less Wrong community has been “I was just standing here, and a bunch of people gathered, and now I’m in the middle of a crowd.” A very delightful and wonderful crowd, just to be clear.

I‘m signed up for cryonics. I think it has a 5% chance of working, which is either very small or very large, depending on how you think about it.

I may or may not have qualia, depending on your definition. I think that philosophical zombies are possible, and I am one. This is a very unimportant fact about me, but seems to incite a lot of conversation with people who care.

I am reflectively consistent, in the sense that I can examine my behavior and desires, and understand what gives rise to them, and there are no contradictions I‘m aware of. I’ve been that way since about 2015. It took decades of work and I’m not sure if that work was worth it.

I have some experience in the design of systems designed for high reliability and resistance to adversaries. I feel like I’ve seen this kind of thinking before.

Your current line of thinking is at a stage I would call “pretheoretical noodling around.” I don’t mean any disrespect; all design has to go through this stage. But you’re not going to find any good references, or come to any conclusions, if you stay at this stage. A next step is to settle on a model of what you want to get done, and what capabilities the adversaries have. You need some bounds on the adversaries; otherwise nothing can work. And of course you need some bounds on what the system does, and how reliably. Once you’ve got this, you can either figure out how to do it, or prove that it can’t be done.

For example there are ways of designing hardware which is reliable on the assumption that at most N transistors are corrupt.

The problem of coming to agreement between a number of actors, some of whom are corrupt, is known as the Byzantine generals problem. It is well studied, and you may find it interesting.

I’m also interested in this topic, and I look forward to seeing where this line of thinking takes you.

Inner alignment failure is a phenomenon that has happened in existing AI systems, weak as they are. So we know it can happen. We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments