Diabloto96 | v1.8.0Mar 19th 2023 | |||
plex | v1.7.0Sep 20th 2022 | (-14) | ||
plex | v1.6.0Sep 20th 2022 | (+93/-83) | ||
Rob Bensinger | v1.5.0May 20th 2022 | (+620/-372) No link to the Risks from Learned Optimization paper?! | ||
Yoav Ravid | v1.4.0Feb 16th 2021 | (+25/-10) | ||
Yoav Ravid | v1.3.0Feb 16th 2021 | (+21) | ||
Ruben Bloom | v1.2.0Oct 1st 2020 | (+1674/-62) | ||
Ruben Bloom | v1.1.0Apr 16th 2020 | (+321/-1838) | ||
Issa Rice | v0.0.3Mar 8th 2018 | (+6/-87) /* External links */ | ||
Issa Rice | v0.0.2Mar 8th 2018 | (+231) /* External links */ |
In the context of AI alignment, the concern is that a base optimizer (e.g., a gradient descent process) may produce a learned model that is itself an optimizer, and that has unexpected and undesirable properties. Even if the gradient descent process is in some sense "trying" to do exactly what human developers want, the resultant mesa-optimizer will not typically be trying to do the exact same thing.1[1]
Wei Dai brings up a similar idea in an SL4 thread.2[2]
The optimization daemons article on Arbital was published probably in 2016.3[1]
"Optimization daemons". Arbital.
Wei Dai. '"friendly" humans?' December 31, 2003.
Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer. AIn this situation, a base optimizeroptimizes and creates a second optimizer, called a mesa-optimizer. optimizerPreviously. The primary reference work underfor this concept was called Inner Optimizer or is Hubinger et al.'s "Risks from Learned Optimization Daemons. in Advanced Machine Learning Systems".
Example: Natural selection is an optimization process (thatthat optimizes for reproductive fitness) thatfitness. Natural selection produced humans (whohumans, who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humansthemselves optimizers. Humans are optimization daemonstherefore mesa-optimizers of natural selection.
In the context of AI alignment, the concern is that an artificial general intelligence exerting optimization pressurea base optimizer (e.g., a gradient descent process) may produce mesa-optimizersa learned model that break alignment.is itself an optimizer, and that has unexpected and undesirable properties. Even if the gradient descent process is in some sense "trying" to do exactly what human developers want, the resultant mesa-optimizer will not typically be trying to do the exact same thing.1
Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI alignment,alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.1
Mesa-Optimization is the situation whichthat occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization Daemons. Arguably,
Natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case, humans are optimization daemons of natural selection. In the context of AI alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce mesa-optimizers that break alignment.1
Previously work under this concept was called Inner Optimizer or Optimization Daemons.
Wei Dai brings up a mesa-optimizersimilar idea in an SL4 thread.2
The optimization daemons article on Arbital was published probably in 2016.3
Jessica Taylor wrote two posts about daemons while at MIRI:
Some posts that arose fromreference optimization daemons:
Mesa-Optimization is the situation which occurs when a learned model (such as a neural network) is itself an optimizer. A base optimizer optimizes and creates a mesa-optimizer. Previously work under this concept was called Inner Optimizer or Optimization daemons are optimizers that result from heavy optimization pressure on a different system. For example, natural selection is an optimization process (that optimizes for reproductive fitness) that produced humans (who are capable of pursuing goals that no longer correlate reliably with reproductive fitness). In this case,Daemons. Arguably, humans are optimization daemonsa mesa-optimizer that arose from the base-optimizer of natural selection. In the context of AI alignment, the concern is that an artificial general intelligence exerting optimization pressure may produce daemons that break alignment.1evolution.
Wei Dai brings up a similar idea in an SL4 thread.2
The optimization daemons article on Arbital was published probably in 2016.3
Jessica Taylor wrote two posts about daemons while at MIRI:
Some posts that reference optimization daemons:
Related ideas:
Some posts that reference optimization daemons:
Related ideas:
Some posts that reference optimization daemons:
Related ideas:
References