Suppose that, like Yudkowsky, you really care about humanity surviving this century but you think that nothing you can do has a decent chance of achieving that.

It's an unfortunate fact of human psychology that, when faced with this kind of situation, people will often do nothing at all instead of the thing which has the highest chance of achieving their goal. Hence, you might give up on alignment research entirely, and either lie in bed all day with paralysing depression, or convert your FAANG income into short-term pleasures. How can we avoid this trap?

It seems we have three options:

  • (1) Change your psychology. This would be the ideal option. If you can do that, then do that. But the historical track-record suggests this is really hard.
  • (2) Change your beliefs. This is called "hope", and it's a popular trick among AI doomers. You change your belief from "there's nothing I can do which makes survival likely" to "there's something I can do which makes survival likely".
  • (3) Change your goals. This is what Yudkowsky proposes. You change your goal from "humanity survives this century" to "my actions increase the log-odds that humanity survives this century". Yudkowsky calls this new goal "dignity". The old goal had only two possible values,  and , but the new goal has possible values anywhere between  and .

Of course, it's risky to change either your beliefs or your goals, because you might face a situation where the optimal policy after the change differs from the optimal policy before the change. But Yudkowsky thinks that (3) is less optimal-policy-corrupting than (2).

Why's that? Well, if you force yourself to believe something unlikely (e.g. "there's something I can do which makes survival likely"), then the inaccuracy can leak into your other beliefs because your beliefs are connected together by a web of inferences. You'll start making poor predictions about AI, and also make silly decisions.

On the other hand, changing your goal from "survival" to "dignity" is like Trying to Try rather than trying — it's relatively less optimal-policy-corrupting.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 11:17 PM

“Oh and btw, and while you are trying to increase the log-odds that humanity survives this century, don’t do anything stupid and rash that is way out-of distribution of normal actions. You are not some God who can do the full utilitarian calculus. If an action you are thinking about is far out-of-distribution and looks probably bad to a lot of people, it’s likely because it is. In other words, don’t naively take rash actions thinking it’s for the good of humanity. Default to 3/4 utilitarian.”

Connor Leahy’s opinion on the post (55:33): 

Yeah I mostly agree with Connor's interpretation of Death with Dignity.

I know a lot of the community thought it was a bad post, and some thought it was downright infohazardous, but the concept of "death with dignity" is pretty lindy actually. When a group of soldiers are fighting a battle with awful odds, they don't change their belief to "a miracle with save us", they change their goal to "I'll fight till my last breath".

If people find the mindset harmful, then they won't use it. If people find the mindset helpful, then they will use it. But I think everyone should try out the mindset for an hour or two.

Strongly upvoted. I unironically think it's a pretty good distillation (I listened to the original post in the background).

Strongly upvoted. I unironically think it's a pretty good distillation (I listened to the original post in the background).