AI ALIGNMENT FORUM
AF

109
The Theoretical Foundations of Reward Learning

The Theoretical Foundations of Reward Learning

Feb 28, 2025 by Joar Skalse

In this sequence I provide an overview of the theoretical reward learning research agenda, including its motivating assumptions, several core results, and some starting points for how to contribute to it further.

15The Theoretical Reward Learning Research Agenda: Introduction and Motivation
Joar Skalse
7mo
4
9Partial Identifiability in Reward Learning
Joar Skalse
7mo
0
10Misspecification in Inverse Reinforcement Learning
Joar Skalse
7mo
0
6STARC: A General Framework For Quantifying Differences Between Reward Functions
Joar Skalse
7mo
0
4Misspecification in Inverse Reinforcement Learning - Part II
Joar Skalse
7mo
0
9Defining and Characterising Reward Hacking
Joar Skalse
7mo
0
10Other Papers About the Theory of Reward Learning
Joar Skalse
7mo
0
10How to Contribute to Theoretical Reward Learning Research
Joar Skalse
7mo
0