Mesa-Optimization

Diabloto96	v1.8.0Mar 19th 2023
plex	v1.7.0Sep 20th 2022	(-14)
plex	v1.6.0Sep 20th 2022	(+93/-83)
Rob Bensinger	v1.5.0May 20th 2022	(+620/-372) No link to the Risks from Learned Optimization paper?!
Yoav Ravid	v1.4.0Feb 16th 2021	(+25/-10)
Yoav Ravid	v1.3.0Feb 16th 2021	(+21)
Ruben Bloom	v1.2.0Oct 1st 2020	(+1674/-62)
Ruben Bloom	v1.1.0Apr 16th 2020	(+321/-1838)
Issa Rice	v0.0.3Mar 8th 2018	(+6/-87) /* External links */
Issa Rice	v0.0.2Mar 8th 2018	(+231) /* External links */

References

Discuss this tag (0)

plex v1.6.0Sep 20th 2022 (+93/-83)

In the context of AI alignment, the concern is that a base optimizer (e.g., a gradient descent process) may produce a learned model that is itself an optimizer, and that has unexpected and undesirable properties. Even if the gradient descent process is in some sense "trying" to do exactly what human developers want, the resultant mesa-optimizer will not typically be trying to do the exact same thing.^{^1[1]}

Wei Dai brings up a similar idea in an SL4 thread.^{^2[2]}

The optimization daemons article on Arbital was published probably in 2016.^{^3[1]}

~~"Optimization daemons". Arbital.~~
~~Wei Dai.~~ ~~'"friendly" humans?'~~ ~~December 31, 2003.~~

^{^}
"Optimization daemons". Arbital.
^{^}
Wei Dai. '"friendly" humans?' December 31, 2003.

Discuss this tag (0)

Rob Bensinger v1.5.0May 20th 2022 (+620/-372) No link to the Risks from Learned Optimization paper?!

Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer. AIn this situation, a base optimizer~~optimizes and~~ creates a second optimizer, called a mesa-~~optimizer.~~ optimizer~~Previously~~. The primary reference work ~~under~~for this concept ~~was called~~ ~~Inner Optimizer~~ or is Hubinger et al.'s "Risks from Learned Optimization ~~Daemons.~~ in Advanced Machine Learning Systems".

Examples

Example: Natural selection is an optimization process ~~(that~~that optimizes for reproductive ~~fitness) that~~fitness. Natural selection produced ~~humans (who~~humans, who are ~~capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans~~themselves optimizers. Humans are ~~optimization daemons~~therefore mesa-optimizers of natural selection.

In the context of AI alignment, the concern is that ~~an artificial general intelligence exerting optimization pressure~~a base optimizer (e.g., a gradient descent process) may produce ~~mesa-optimizers~~a learned model that ~~break alignment.~~is itself an optimizer, and that has unexpected and undesirable properties. Even if the gradient descent process is in some sense "trying" to do exactly what human developers want, the resultant mesa-optimizer will not typically be trying to do the exact same thing.¹

Related ideas

~~Thou Art Godshatter~~

Discuss this tag (0)

Yoav Ravid v1.4.0Feb 16th 2021 (+25/-10)

Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI ~~alignment~~,alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.¹

Discuss this tag (0)

Yoav Ravid v1.3.0Feb 16th 2021 (+21)

Video by Robert Miles

Discuss this tag (0)

Ruben Bloom v1.2.0Oct 1st 2020 (+1674/-62)

Mesa-Optimization is the situation ~~which~~that occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization Daemons. ~~Arguably,~~

Examples

Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.¹

History

Previously work under this concept was called Inner Optimizer or Optimization Daemons.

Wei Dai brings up a ~~mesa-optimizer~~similar idea in an SL4 thread.²

The optimization daemons article on Arbital was published probably in 2016.³

Jessica Taylor wrote two posts about daemons while at MIRI:

"Are daemons a problem for ideal agents?" (2017-02-11)
"Maximally efficient agents will probably have an anti-daemon immune system" (2017-02-23)

References

"Optimization daemons". Arbital.
Wei Dai. '"friendly" humans?' December 31, 2003.

External links

Some posts that ~~arose from~~reference optimization daemons:

"Cause prioritization for downside-focused value systems": "Alternatively, perhaps goal preservation becomes more difficult the ~~base-optimizer~~more capable AI systems become, in which case the future might be controlled by unstable goal functions taking turns over the steering wheel"
"Techniques for optimizing worst-case performance": "The difficulty of ~~evolution.~~
optimizing worst-case performance is one of the most likely reasons that I think prosaic AI alignment might turn out to be impossible (if combined with an unlucky empirical situation)." (the phrase "unlucky empirical situation" links to the optimization daemons page on Arbital)

Related ideas

Thou Art Godshatter

Discuss this tag (0)

Ruben Bloom v1.1.0Apr 16th 2020 (+321/-1838)

Mesa-Optimization is the situation which occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization ~~daemons~~ are optimizers that result from heavy optimization pressure on a different system. For example, natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case,Daemons. Arguably, humans are ~~optimization daemons~~a mesa-optimizer that arose from the base-optimizer of ~~natural selection. In the context of~~ ~~AI alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce daemons that break alignment.~~¹evolution.

History

~~Wei Dai~~ ~~brings up a similar idea in an SL4 thread.~~²

~~The optimization daemons article on~~ ~~Arbital~~ ~~was published probably in 2016.~~³

~~Jessica Taylor~~ ~~wrote two posts about daemons while at~~ ~~MIRI~~:

~~"Are daemons a problem for ideal agents?"~~ ~~(2017-02-11)~~
~~"Maximally efficient agents will probably have an anti-daemon immune system"~~ ~~(2017-02-23)~~

References

External links

~~Some posts that reference optimization daemons:~~

~~"Cause prioritization for downside-focused value systems"~~: "Alternatively, perhaps goal preservation becomes more difficult the more capable AI systems become, in which case the future might be controlled by unstable goal functions taking turns over the steering wheel"
~~"Techniques for optimizing worst-case performance"~~: "The difficulty of optimizing worst-case performance is one of the most likely reasons that I think prosaic AI alignment might turn out to be impossible (if combined with an unlucky empirical situation)." (the phrase "unlucky empirical situation" links to the optimization daemons page on ~~Arbital~~)
~~"Prize for probable problems": "I'm happy to provisionally grant that optimization daemons would be catastrophic if you couldn’t train robust models."~~

~~Related ideas:~~

~~Thou Art Godshatter~~

~~Wei Dai.~~ ~~'"friendly" humans?'~~ ~~December 31, 2003.~~↩
~~"Optimization daemons". Arbital.~~↩

Discuss this tag (0)

Issa Rice v0.0.3Mar 8th 2018 (+6/-87) /* External links */

External links

Some posts that reference optimization daemons:

"Cause prioritization for downside-focused value systems": "Alternatively, perhaps goal preservation becomes more difficult the more capable AI systems become, in which case the future might be controlled by unstable goal functions taking turns over the steering wheel"
"Techniques for optimizing worst-case performance": "The difficulty of optimizing worst-case performance is one of the most likely reasons that I think prosaic AI alignment might turn out to be impossible (if combined with an unlucky empirical situation)." (the phrase "unlucky empirical situation" links to the optimization daemons page on Arbital)
[~~https://www.lesserwrong.com/posts/SqcPWvvJJwwgZb6aH/prize-for-probable-problems~~ ~~"Prize~~"Prize for probable problems": "I'm happy to provisionally grant that optimization daemons would be catastrophic if you couldn’t train robust models."

Related ideas:

Thou Art Godshatter

Wei Dai. '"friendly" humans?' December 31, 2003.↩
"Optimization daemons". Arbital.↩

Discuss this tag (0)

Issa Rice v0.0.2Mar 8th 2018 (+231) /* External links */

External links

Some posts that reference optimization daemons:

"Cause prioritization for downside-focused value systems": "Alternatively, perhaps goal preservation becomes more difficult the more capable AI systems become, in which case the future might be controlled by unstable goal functions taking turns over the steering wheel"
"Techniques for optimizing worst-case performance": "The difficulty of optimizing worst-case performance is one of the most likely reasons that I think prosaic AI alignment might turn out to be impossible (if combined with an unlucky empirical situation)." (the phrase "unlucky empirical situation" links to the optimization daemons page on Arbital)
[https://www.lesserwrong.com/posts/SqcPWvvJJwwgZb6aH/prize-for-probable-problems "Prize for probable problems": "I'm happy to provisionally grant that optimization daemons would be catastrophic if you couldn’t train robust models."

Related ideas: