x

AI ALIGNMENT FORUM

AF

nealeratzlaff — AI Alignment Forum

nealeratzlaff

nealeratzlaff

Message

8

1

7y

nealeratzlaff

8

7y

Avoiding Side Effects in Complex Environments

by TurnTrout and nealeratzlaff

Previously: Attainable Utility Preservation: Empirical Results; summarized in AN #105 Our most recent AUP paper was accepted to NeurIPS 2020 as a spotlight presentation: > Reward function specification can be difficult, even in simple environments. Rewarding the agent for making a widget may be easy, but penalizing the multitude of...

Dec 12, 2020•62

Attainable Utility Preservation: Empirical Results

by TurnTrout and nealeratzlaff

Reframing Impact has focused on supplying the right intuitions and framing. Now we can see how these intuitions about power and the AU landscape both predict and explain AUP's empirical success thus far. Conservative Agency in Gridworlds Let's start with the known and the easy: avoiding side effects[1] in the...

Feb 22, 2020•66