Misaligned Incentives
In much the same way that AI systems may have perverse incentives, so do the [AI safety researchers]. They are [humans]. They need to make money, [feed themselves, and attract partners]. [Redacted and redacted even just got married.] This type of accountability to [personal] interests is not perfectly in line with doing what is good for human interests. Moreover, [AI safety researchers are often] technocrats whose values and demographics do not represent humanity particularly well. Optimizing for the goals that the [AI safety researchers] have is not the same thing as optimizing for human welfare. Goodhart’s Law applies.

I feel pretty similarly about most of the other arguments in this post.

Tbc I think there are plenty of things one could reasonably critique scaling labs about, I just think the argumentation in this post is by and large off the mark, and implies a standard that if actually taken literally would be a similarly damning critique of the alignment community.

(Conflict of interest notice: I work at Google DeepMind.)

Reply

[-]scasper2y54

Thanks. I agree that the points apply to individual researchers. But I don't think that it applies in a comparably worrisome way because individual researchers do not have comparable intelligence, money, and power compared to the labs. This is me stressing the "when put under great optimization pressure" of Goodhart's Law. Subtle misalignments are much less dangerous when there is a weak optimization force behind the proxy than when there is a strong one.

Reply