Complexity of Value

Applied to Notes on Caution by David Gross at 2mo

Many human choices can be compressed, by representing them by simple rules - the desire to survive produces innumerable actions and subgoals as we fulfill that desire. But people don'don't just want to survive - although you can compress many human activities to that desire, you cannot compress all of human existence into it. The human equivalents of a utility function, our terminal values, contain many different elements that are not strictly reducible to one another. William Frankena offered this list of things which many cultures and people seem to value (for their own sake rather than strictly for their external consequences):

Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one'one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.

The "etc.""etc." at the end is the tricky part, because there may be a great many values not included on this list.

SinceOne hypothesis is that natural selection reifies selection pressures as psychological drivesdrives, which then continue to execute independently of any consequentialist reasoning in the organism or. This may also continue without that organism explicitly representing, let alone caring about, the original evolutionary context,context. Under this view, we have no reason to expect these terminal values to be reducible to any one thing, or each other.

Complexity of value also runs into underappreciation in the presence of bad metaethics. The local flavor of metaethics could be characterized as cognitivist, without implying "thick""thick" notions of instrumental rationality; in other words, moral discourse can be about a coherent subject matter, without all possible minds and agents necessarily finding truths about that subject matter to be psychologically compelling. An expected paperclip maximizer doesn'doesn't disagree with you about morality any more than you disagree with it about "which"which action leads to the greatest number of expected paperclips"paperclips", it is just constructed to find the latter subject matter psychologically compelling but not the former. Failure to appreciate that "But it'"But it's just paperclips! What a dumb goal! No sufficiently intelligent agent would pick such a dumb goal!"" is a judgment carried out on a local brain that evaluates paperclips as inherently low-in-the-preference-ordering means that someone will expect all moral judgments to be automatically reproduced in a sufficiently intelligent agent, since, after all, they would not lack the intelligence to see that paperclips are so obviously inherently-low-in-the-preference-ordering. This is a particularly subtle species of anthropomorphism and mind projection fallacy.

As values are orthogonal with intelligence, they can freely vary no matter how intelligent and efficient an AGI is [1]. Since human / humane values have high Kolmogorov complexity, a random AGI is highly unlikely to maximize human / humane values. The fragility of value thesis implies that a poorly constructed AGI might e.g. turn us into blobs of perpetual orgasm. Because of this relevance the complexity and fragility of value is a major theme of Eliezer Yudkowsky''s writings.

Wrongly designing the future because we wrongly encoded human values is a serious and difficult to assess type of Existential risk. "Touch"Touch too hard in the wrong dimension, and the physical representation of those values will shatter - and not come back, for there will be nothing left to want to bring it back. And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being. Let go of the steering wheel, and the Future crashes."" [2]

Complexity of value poses a problem for AI alignment. If you can'can't easily compress what humans want into a simple function that can be fed into a computer, it isn'isn't easy to make a powerful AI that does things humans want and doesn'doesn't do things humans don'don't want. Value Learning attempts to address this problem.

Applied to Your Preferences by PeterL at 1y