Impact Regularization - AI Alignment Forum

Victoria Krakovna	v1.12.0Dec 9th 2022	(+41/-31)
Alex Turner	v1.11.0Nov 21st 2020	(+22) Softened epistemic status to not exaggerate intuitive implications of power-seeking theorems
Alex Turner	v1.10.0Aug 5th 2020	(+95/-32)
Alex Turner	v1.9.0Aug 5th 2020	(+7/-7)
Alex Turner	v1.8.0Aug 5th 2020	(+57)
Ruben Bloom	v1.7.0Aug 5th 2020	(+19/-17)
Ruben Bloom	v1.6.0Aug 5th 2020
Ruben Bloom	v1.5.0Aug 5th 2020	(+10/-13)
Ruben Bloom	v1.4.0Aug 4th 2020	(-1)
Alex Turner	v1.3.0Aug 4th 2020	(+1507/-20)

Load More (10/12)

Victoria Krakovna v1.12.0Dec 9th 2022 (+41/-31)

Impact ~~measures~~regularizers penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.

Currently, impact ~~measurement~~regularization research focuses on two approaches:

Sequences on impact ~~measurement:~~regularization:

Discuss this tag (0)

Alex Turner v1.11.0Nov 21st 2020 (+22) Softened epistemic status to not exaggerate intuitive implications of power-seeking theorems

Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define "low impact" in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don't want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.

Discuss this tag (0)

Alex Turner v1.10.0Aug 5th 2020 (+95/-32)

Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define "low impact" in a way that a computer can ~~understand? How~~understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard~~, and~~; we don't want AI systems to rampantly disrupt their environment. ~~Furthermore,~~In the limit of goal-directed intelligence, seeking power tends to be optimal; we don't want highly capable AI systems to permanently wrench control of the future from us.

Discuss this tag (0)

Alex Turner v1.9.0Aug 5th 2020 (+7/-7)

Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define "low impact" in a way that a computer can understand? How do you measure ~~impact~~?impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.

Discuss this tag (0)

Alex Turner v1.8.0Aug 5th 2020 (+57)

Impact measures penalize an AI for affecting us too much.To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define "low impact" in a way that a computer can understand? How do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.

Discuss this tag (0)

Ruben Bloom v1.7.0Aug 5th 2020 (+19/-17)

To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define ~~low impact~~"low impact" in a way that a computer can understand? How do you measure ~~impact?~~impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.

Discuss this tag (0)

Ruben Bloom v1.6.0Aug 5th 2020

Discuss this tag (0)

Ruben Bloom v1.5.0Aug 5th 2020 (+10/-13)

How do you rigorously define ~~"low impact"~~low impact in a way that a computer can understand? How do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.

Discuss this tag (0)

Ruben Bloom v1.4.0Aug 4th 2020 (-1)

Relative reachability: the AI preserves its ability to reach many kinds of world-states. The hope is that by staying able to reach many goal states, the AI stays able to reach the correct goal state.
Attainable utility preservation: the AI preserves its ability to achieve one or more auxiliary goals. The hope is that by penalizing gaining or losing control over the future, the AI doesn't take control away from us.

Discuss this tag (0)

Alex Turner v1.3.0Aug 4th 2020 (+1507/-20)

To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. ~~But how~~You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define ~~"low impact"~~"low impact" in a way that a computer can understand? How do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard, and we don't want AI systems to rampantly disrupt their environment. Furthermore, we don't want highly capable AI systems to permanently wrench control of the future from us.

Currently, impact? measurement research focuses on two approaches:

Relative reachability: the AI preserves its ability to reach many kinds of world-states. The hope is that by staying able to reach many goal states, the AI stays able to reach the correct goal state.
Attainable utility preservation: the AI preserves its ability to achieve one or more auxiliary goals. The hope is that by penalizing gaining or losing control over the future, the AI doesn't take control away from us.

For a review of earlier work, see A Survey of Early Impact Measures.

Sequences on impact measurement:

Reframing Impact: we're impacted when we become more or less able to achieve our goals. Seemingly, goal-directed AI systems are only incentivized to catastrophically impact us in order to gain power to achieve their own goals. To avoid catastrophic impact, what if we penalize the AI for gaining power?
Subagents and Impact Measures explores how subagents can circumvent current impact measure formalizations.

Discuss this tag (0)

Load More (10/12)