Mesa-Optimization

Yoav Ravid (+25/-10)
Yoav Ravid (+21)
Ruben Bloom (+1674/-62)
Ruben Bloom (+321/-1838)

Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI alignment,alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.1

Mesa-Optimization is the situation whichthat occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization Daemons. Arguably,

Examples

Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.1

History

Previously work under this concept was called Inner Optimizer or Optimization Daemons.

Wei Dai brings up a mesa-optimizersimilar idea in an SL4 thread.2

The optimization daemons article on Arbital was published probably in 2016.3

Jessica Taylor wrote two posts about daemons while at MIRI:

See also

References

  1. "Optimization daemons". Arbital.
  2. Wei Dai. '"friendly" humans?' December 31, 2003.

External links

Some posts that arose fromreference optimization daemons:

optimizing worst-case performance is one of the most likely reasons that I think prosaic AI alignment might turn out to be impossible (if combined with an unlucky empirical situation)." (the phrase "unlucky empirical situation" links to the optimization daemons page on Arbital)

Related ideas

Mesa-Optimization is the situation which occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization Daemons. Arguably, humans are a mesa-optimizer that arose from the base-optimizer of evolution.