Personal Blog

Much concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is "why would we give an AI goals anyway?" I think there are good reasons to expect goal-oriented behavior, and I've been on that side of a lot of arguments. But I don't think the issue is settled, and it might be possible to get better outcomes by directly specifying what actions are good. I flesh out one possible alternative here.

(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments. Big-picture discussion should probably stay here.)

New Comment
3 comments, sorted by Click to highlight new comments since:

It seems that if it is desired, the overseer could also set their behaviour and intentions so that the approval-directed agent acts as we would want an oracle or tool to act. This is a nice feature.

I think Nick Bostrom and Stuart Armstrong would also be interested in this, and might have good feedback for you.

High-level feedback: this is a really interesting proposal, and looks like a promising direction to me! Most of my inline comments on Medium are more critical, but that doesn't reflect my overall assessment.