If it's read moral philosophy, it should have some notion of what the words "human values" mean.
GPT-3 and systems like it are trained to mimic human discourse. Even if (in the limit of arbitrary computational power) it manages to encode an implicit representation of human values somewhere in its internal state, in actual practice there is nothing tying that representation to the phrase "human values", since moral philosophy is written by (confused) humans, and in human-written text the phrase "human values" is not used in the consistent, coherent manner that would be required to infer its use as a label for a fixed concept.
1 and 2 are hard to succeed at without making a lot of progress on 4
It's not obvious to me why this ought to be the case. Could you elaborate?
If there's some kind of measure of "observer weight" over the whole mathematical universe, we might be already much larger than 1/3^^^3 of it, so the total utilitarian can only gain so much.
Could you provide some intuition for this? Naively, I'd expect our "observer measure" over the space of mathematical structures to be 0.