13 Brute force searching for alignment

by Donald Hobson

27th Jun 2021

2 min read

3

13

AI

Frontpage

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:47 PM

[-]Daniel Kokotajlo4y30

OK, now I get what you are saying! Interesting. I am skeptical that this will work for most alignment problems, due to lack of simple conceptual core maybe. In particular, I doubt that corrigibility and non-deceptiveness have simple conceptual cores. I hope I'm wrong.

Reply

[-]adamShimi4y30

Well, if you worry that these properties don't have a simple conceptual core, maybe you can do the trick where you try to formalize a subset of them with a small conceptual core. That's basically Evan move on Myopia as a more easy to study subset of non-deceptiveness.

Reply

[-]adamShimi4y10

If I try to rephrase it in my words, your proposal looks like a way to go from partial deconfusion (in the form of an extensive definition, a list of examples of what you want) to full deconfusion (an actual program with the property that you want) through brute force search.

Stated like that, it looks really cool. I wonder if you need an AGI already to do the search with a reasonable amount of compute. In this case, the worry might be that you have to deconfuse what you want to deconfuse before being able to apply this technique, which would make it useless.

Still, I will add this sort of thought experiment to my bag of tools. It's a pretty good argument for extensive definition in a way.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

13

Brute force searching for alignment

13