When you say the human decision procedure causes human values, what I hear is that the human decision procedure (and its surrounding way of describing the world) is more ontologically basic than human values (and their surrounding way if describing the world).

Our decision procedure is "the reason for our values" in the same way that the motion of electric charge in your computer is the reason it plays videogames (even though "the electric charge is moving" and "it's playing a game" might be describing the same physical event). The arrow between them isn't the most typical causal arrow between two peers in a singular way of describing the world, it's an arrow of reduction/emergence, between things at different levels of abstraction.

Reply

[-]Gordon Seidoh Worley6y10

I think I basically agree with this and think it's right. In some ways you might say focusing too much on "values" acts like a barrier to deeper investigation of the mechanisms at work here, and I think looking deeper is necessary because I expect that optimization against the value abstraction layer alone will result in Goodharting.

Reply

[-]avturchin6y10

It looks like the idea of human values is very contradictional. May be we should dissolve it? What about "AI safety" without human values?

Reply

[-]Gordon Seidoh Worley6y20

In some sense that's a direction I might be moving in with my thinking, but there is still some thing that humans identify as values that they care about, so I expect there to be some real phenomenon going on that needs to be considered to get good outcomes, since I expect the default remains a bad outcome if we don't pay attention to whatever it is that makes humans care about stuff. I expect most work today on value learning is not going to get us where we want to go because it's working with the wrong abstractions, and my goal in this work is to dissolve those abstractions to find better ones for our long-term purposes.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

4

Towards deconfusing values

4

The New Model Axiology

Implications and Considerations

Confounded Notions of Preferences

How It Helps

Next Steps