AI ALIGNMENT FORUM
AF

304
Wikitags

Utility Functions

Edited by the gears to ascension, Multicore, abramdemski, steven0461, Ruby, et al. last updated 30th Dec 2024

Utility Function is a function that assigns numerical values ("utilities") to outcomes, in such a way that outcomes with higher utilities are absolutely always preferred to outcomes with lower utilities, with no exceptions; the lack of exploitable holes in the preference ordering is necessary for the definition and separates utility from mere reward.

See also: Complexity of Value, Decision Theory, Game Theory, Orthogonality Thesis, Utilitarianism, Preference, Utility, VNM Theorem

Utility Functions do not work very well in practice for individual humans. Human drives are not coherent nor is there any reason to think they would converge to a utility-function-grade level of reliability (Thou Art Godshatter), and even people with a strong interest in the concept have trouble working out what their utility function actually is even slightly (Post Your Utility Function). Furthermore, humans appear to calculate reward and loss separately - adding one to the other does not predict their behavior accurately, and thus human reward is not human utility. This makes humans highly exploitable - and in fact, not being exploitable would be a minimum requirement in order to qualify as having a coherent utility function.

pjeby posits humans' difficulty in understanding their own utility functions as the root of akrasia.

However, utility functions can be a useful model for dealing with humans in groups, e.g. in economics.

The VNM Theorem tag is likely to be a strict subtag of the Utility Functions tag, because the VNM theorem establishes when preferences can be represented by a utility function, but a post discussing utility functions may or may not discuss the VNM theorem/axioms.

Because utility functions arise from VNM rationality, they may still be of note in understanding intelligent systems even when the system does not explicitly store a utility function anywhere, since reducing exploitable error rate should eventually converge to utility-function-like guarantees.

Subscribe
Discussion
2
Subscribe
Discussion
2
Posts tagged Utility Functions
61An Orthodox Case Against Utility Functions
abramdemski
6y
45
54Coherence arguments do not entail goal-directed behavior
Rohin Shah
7y
50
55Utility ≠ Reward
Vlad Mikulik
6y
16
55Why Not Subagents?
johnswentworth, David Lorell
2y
14
10Bayesian Utility: Representing Preference by Probability Measures
Vladimir_Nesov
16y
0
47Why Subagents?
johnswentworth
6y
12
45Shard Theory: An Overview
David Udell
3y
2
49Ngo and Yudkowsky on AI capability gains
Eliezer Yudkowsky, Richard_Ngo
4y
46
33Why The Focus on Expected Utility Maximisers?
Q
DragonGod, Scott Garrabrant
3y
Q
1
38Coherence arguments imply a force for goal-directed behavior
KatjaGrace
4y
18
21Research Agenda v0.9: Synthesising a human's preferences into a utility function
Stuart_Armstrong
6y
18
30Consequentialism & corrigibility
Steven Byrnes
4y
9
29Comparing Utilities
abramdemski
5y
15
18Game Theory without Argmax [Part 1]
Cleo Nardo
2y
1
16Using expected utility for Good(hart)
Stuart_Armstrong
7y
1
Load More (15/37)
Add Posts