AI ALIGNMENT FORUM
AF

Wikitags

Value Learning

Written by Joao Fabiano, Roger Dearnaley, et al. last updated 30th Dec 2024

Value Learning is a proposed method for incorporating human values in an AGI. It involves the creation of an artificial learner whose actions consider many possible sets of values and preferences, weighed by their likelihood. Value learning could prevent an AGI of having goals detrimental to human values, hence helping in the creation of Friendly AI.

Many ways have been proposed to incorporate human values in an AGI (e.g.: , and , mostly proposed around 2004-2010). Value learning was suggested in 2011 by Daniel Dewey in ‘Learning What to Value’. Like most authors, he assumes that an artificial agent needs to be intentionally aligned to human goals. First, Dewey argues against the use of a simple use of to solve this problem, on the basis that this lead to the maximization of specific rewards that can diverge from value maximization. For example, this could suffer from goal misspecification or reward hacking. He proposes a maximizer comparable to AIXI, which considers all possible utility functions weighted by their Bayesian probabilities: "[W]e propose uncertainty over utility functions. Instead of providing an agent one utility function up front, we provide an agent with a pool of possible utility functions and a probability distribution P such that each utility function can be assigned probability P(Ujyxm) given a particular interaction history [yxm]. An agent can then calculate an expected value over possible utility functions given a particular interaction history"

Nick Bostrom also discusses value learning at length in his book Superintelligence. Value learning is closely related to various proposals for and research. Since human values are and fragile, learning human values well is a challenging problem, much like AI-assisted Alignment (but in a less supervised setting, so actually harder). So this is only a practicable alignment technique for AGI capable of successfully performing a STEM research program (in Anthropology). Thus value learning is (unusually) an alignment technique that improves as capabilities increase, and it requires around an AGI minimum threshold of capabilities to begin to be effective.

One potential challenge is that human values are somewhat mutable and AGI could affect them.

References

  • Dewey’s paper

See Also

  • Friendly AI
  • and
Subscribe
Subscribe
Coherent Aggregated Volition
Coherent Aggregated Volition
AI-assisted Alignment
AI-assisted Alignment
utility function
reinforcement learning
Reinforcement learning
AI-assisted/AI automated Alignment
AI-assisted/AI automated Alignment
Coherent Extrapolated Volition
Coherent Extrapolated Volition
Coherent Blended Volition
Coherent Blended Volition
Discussion0
Discussion0
complex
Complexity of value
Value extrapolation
Posts tagged Value Learning
15The easy goal inference problem is still hard
Paul Christiano
7y
3
9Humans can be assigned any values whatsoever…
Stuart Armstrong
7y
1
11Model Mis-specification and Inverse Reinforcement Learning
Owain Evans, Jacob Steinhardt
7y
0
13Ambitious vs. narrow value learning
Paul Christiano
6y
15
21Conclusion to the sequence on value learning
Rohin Shah
6y
1
20Intuitions about goal-directed behavior
Rohin Shah
7y
5
16Requirements for a Basin of Attraction to Alignment
Roger Dearnaley
1y
0
14Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)
Roger Dearnaley
2y
0
66. The Mutable Values Problem in Value Learning and CEV
Roger Dearnaley
2y
0
19What is ambitious value learning?
Rohin Shah
7y
13
24Normativity
Abram Demski
5y
9
12Learning human preferences: black-box, white-box, and structured white-box access
Stuart Armstrong
5y
9
4The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Roger Dearnaley
1mo
0
402018 AI Alignment Literature Review and Charity Comparison
Larks
7y
4
55Evaluating the historical value misspecification argument
Matthew Barnett
2y
77
Load More (15/99)
Add Posts