Value Impact

[-]DanielFilan6y30

It's interesting to me to consider the case of me getting into a PhD program at UC Berkeley, which felt pretty impactful. It wasn't that I intrinsically valued being a PhD student at Berkeley, and it wasn't just that being a PhD student at Berkeley objectively gave any agent greater ability to achieve their goals (although they pay you, so it's true to some extent), it was that it gave me greater ability to achieve my goals by (a) being able to learn more about AI alignment and (b) getting to hang out with my friends and friends-of-friends in the Bay Area. (a) and (b) weren't automatic consequences of being admitted to the program, I had to do some work to make them happen, and they aren't universally valuable. A simplified example of this kind of thing is somebody giving you a non-transferrable $100 gift voucher for GameStop.

[-]sayan6y30

As far as I understand, this post decomoses 'impact' into value impact and objective impact. VI is dependent on some agent's ability to reach arbitrary value-driven goals, while OI depends on any agent's ability to reach goals in general.

I'm not sure if there exists a robust distinction between the two - the post doesn't discuss any general demarcation tool.

Maybe I'm wrong, but I think the most important point to note here is that 'objectiveness' of an impact is defined not to be about the 'objective state of the world' - rather about how 'general to all agents' an impact is.

[-]TurnTrout6y20

VI is dependent on some agent's ability to reach arbitrary value-driven goals, while OI depends on any agent's ability to reach goals in general.

VI depends on the ability to do one kind of goal in particular, like human values. OI depends on goals in general.

I'm not sure if there exists a robust distinction between the two - the post doesn't discuss any general demarcation tool.

If I understand correctly, this is wondering whether there are some impacts that count for ~50% of all agents, or 10%, or .01% - where do we draw the line? It seems to me that any natural impact (that doesn't involve something crazy like "if the goal encoding starts with '0', shut them off; otherwise, leave them alone") either affects a very low percentage of agents or a very high percentage of agents. So, I'm not going to draw an exact line, but I think it should be intuitively obvious most of the time.

Maybe I'm wrong, but I think the most important point to note here is that 'objectiveness' of an impact is defined not to be about the 'objective state of the world' - rather about how 'general to all agents' an impact is

This is exactly it.

Appendix: Contrived Objectives

A natural definitional objection is that a few agents aren't affected by objectively impactful events. If you think every outcome is equally good, then who cares if the meteor hits?

Obviously, our values aren't like this, and any agent we encounter or build is unlikely to be like this (since these agents wouldn't do much). Furthermore, these agents seem contrived in a technical sense (low measure under reasonable distributions in a reasonable formalization), as we'll see later. That is, "most" agents aren't like this.

From now on, assume we aren't talking about this kind of agent.

Notes

Eliezer introduced Pebblesorters in the the Sequences; I made them robots here to better highlight how pointless the pebble transformation is to humans.

In informal parts of the sequence, I'll often use "values", "goals", and "objectives" interchangeably, depending on what flows.

We're going to lean quite a bit on thought experiments and otherwise speculate on mental processes. While I've taken the obvious step of beta-testing the sequence and randomly peppering my friends with strange questions to check their intuitions, maybe some of the conclusions only hold for people like me. I mean, some people don't have mental imagery – who would've guessed? Even if so, I think we'll be fine; the goal is for an impact measure – deducing human universals would just be a bonus.

Objective impact is objective with respect to the agent's values – it is not the case that an objective impact affects you anywhere and anywhen in the universe! If someone finds $100, that matters for agents at that point in space and time (no matter their goals), but it doesn't mean that everyone in the universe is objectively impacted by one person finding some cash!

If you think about it, the phenomenon of objective impact is surprising. See, in AI alignment, we're used to no-free-lunch this, no-universal-argument that; the possibility of something objectively important to agents hints that our perspective has been incomplete. It hints that maybe this "impact" thing underlies a key facet of what it means to interact with the world. It hints that even if we saw specific instances of this before, we didn't know we were looking at, and we didn't stop to ask.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

22

22

Appendix: Contrived Objectives