AI ALIGNMENT FORUM
AF

Wikitags

Goodhart's Law

Edited by Ruben Bloom, Vladimir Nesov, et al. last updated 19th Mar 2023

Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart's Law is of particular relevance to . Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart's law, the proxy will breakdown.

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

  • Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.
  • Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
  • Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.
  • Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

See Also

  • , ,
  • Adaptation executers,
  • ,
  • ,
  • ,
Subscribe
2
Subscribe
2
Groupthink
Information cascade
AI Alignment
Discussion0
Discussion0
Filtered evidence
Superstimulus
Cached thought
Egalitarianism
Modesty argument
Signaling
Dark arts
Rationalization
Epistemic hygiene
Affective death spiral
Scoring rule
Posts tagged Goodhart's Law
54Goodhart Taxonomy
Scott Garrabrant
7y
23
26Classifying specification problems as variants of Goodhart's Law
Victoria Krakovna
6y
5
18Specification gaming examples in AI
Victoria Krakovna
7y
8
62When is Goodhart catastrophic?
Drake Thomas, Thomas Kwa
2y
15
10Goodhart's Curse and Limitations on AI Alignment
Gordon Seidoh Worley
6y
0
24How does Gradient Descent Interact with Goodhart?
Q
Scott Garrabrant, Evan Hubinger
6y
Q
4
22Introduction to Reducing Goodhart
Charlie Steiner
4y
5
25Does Bayes Beat Goodhart?
Abram Demski
6y
11
18Defeating Goodhart and the "closest unblocked strategy" problem
Stuart Armstrong
6y
12
16Using expected utility for Good(hart)
Stuart Armstrong
7y
1
12Catastrophic Regressional Goodhart: Appendix
Thomas Kwa, Drake Thomas
2y
1
32Don't design agents which exploit adversarial inputs
Alex Turner, Garrett Baker
3y
33
14Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)
Roger Dearnaley
2y
0
54Embedded Agency (full-text version)
Scott Garrabrant, Abram Demski
7y
4
28Optimization Amplifies
Scott Garrabrant
7y
3
Load More (15/57)
Add Posts