Linda Linsefors	v1.6.0Oct 9th 2023	(+9/-80)
markovial	v1.5.0May 4th 2023	(+109/-223)
markovial	v1.4.0May 2nd 2023
markovial	v1.3.0Apr 28th 2023	(+1949/-279)
brook	v1.2.0Aug 10th 2020	(+27/-17)
Ben Pace	v1.1.0Jul 17th 2020	(+2/-1)
Ben Pace	v1.0.0Jul 17th 2020	(+483)

Linda Linsefors v1.6.0Oct 9th 2023 (+9/-80)

This is often taken to be separate from the inner alignment problem, which ~~asks - “Is the model trying to do what humans want it to do?”, or in other words~~asks: How can we robustly aim our AI optimizers at any objective function at all?

markovial v1.5.0May 4th 2023 (+109/-223)

Outer alignment asks the question - "What should we aim our model at?" In other words, is the model optimizing for the correct reward such that there are no exploitable loopholes? It is also known as the reward misspecification problem. ~~This is because if we are unable to tell the AI what we want correctly then we have not been able to specify our reward.~~

To solve the outer alignment problem, some sub-problems that we would have to make progress on include specification gaming, value learning, and reward shaping/modeling. ~~Paul Christiano is a researcher who focuses on~~Some proposed solutions to outer alignment ~~and who has proposed~~include scalable oversight techniques such as IDA, as ~~potential solutions to this problem.~~well as adversarial oversight techniques such as debate.

Discuss this tag (0)

markovial v1.4.0May 2nd 2023

Discuss this tag (0)

markovial v1.3.0Apr 28th 2023 (+1949/-279)

Outer ~~Alignment~~ inalignment asks the ~~context of machine learning~~question - "What should we aim our model at?" In other words, is the ~~property where~~model optimizing for the correct reward such that there are no exploitable loopholes? It is also known as the reward misspecification problem. This is because if we are unable to tell the AI what we want correctly then we have not been able to specify our reward.

Overall, outer alignment as a problem is intuitive enough to understand, i.e., is the specified loss function is aligned with the intended goal of its ~~designers.~~designers? However, implementing this in practice is extremely difficult. Conveying the full “intention” behind a human request is equivalent to conveying the sum of all human values and ethics. This is ~~an intuitive notion,~~difficult in part because human intentions are themselves not ~~well-~~well understood. Additionally, since most models are designed as goal optimizers, they are all susceptible to Goodhart’s Law which means that we might be unable to foresee negative consequences that arise due to excessive optimization pressure on a goal that would look otherwise well specified to humans.

To solve the outer alignment problem, some sub-problems that we would have to make progress on include specification gaming, value learning, and reward shaping/modeling. Paul Christiano is a researcher who focuses on outer alignment and who has proposed techniques such as IDA as potential solutions to this problem.

Outer Alignment vs. Inner Alignment

This is ~~what is typically discussed as~~often taken to be separate from the ~~'value alignment' problem. It is contrasted with~~ inner alignment problem, which asks - “Is the model trying to do what humans want it to do?”, ~~which discusses if an optimizer~~or in other words can we robustly aim our AI optimizers at any objective function at all?

It should be kept in mind that you can have both inner and outer alignment failures together. It is not a dichotomy and often even experienced alignment researchers are unable to tell them apart. This indicates that the ~~production~~classifications of anfailures according to these terms are fuzzy. Ideally, we don't think of a binary dichotomy of inner and outer ~~aligned system, then whether~~alignment that ~~optimizer is itself aligned.~~~~See also:~~ can be tackled individually but of a more holistic alignment picture that includes the interplay between both inner and outer alignment approaches.

Discuss this tag (0)

brook v1.2.0Aug 10th 2020 (+27/-17)

Outer Alignment in the context of machine learning is the property where the specified loss function is aligned with the intended goal of its designers. This is an intuitive notion, in part because human intentions are themselves not well-understood. This is what is typically discussed as the ~~'value alignment'~~'value alignment' problem. It is contrasted with inner alignment, which discusses if an optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned.See also:

Discuss this tag (0)

Ben Pace v1.1.0Jul 17th 2020 (+2/-1)

Outer Alignment in the context of machine learning is the property where the specified loss function is aligned with the intended goal of its designers. This is an intuitive notion, in part because human intentions are themselves not well-understood. This is what is typically discussed as the 'value alignment' problem. It is contrasted with inner alignment, which discusses if an optimizer is the production of aan outer aligned system, then whether that optimizer is itself aligned.

Discuss this tag (0)

Ben Pace v1.0.0Jul 17th 2020 (+483)

Outer Alignment in the context of machine learning is the property where the specified loss function is aligned with the intended goal of its designers. This is an intuitive notion, in part because human intentions are themselves not well-understood. This is what is typically discussed as the 'value alignment' problem. It is contrasted with inner alignment, which discusses if an optimizer is the production of a outer aligned system, then whether that optimizer is itself aligned.

Discuss this tag (0)