AI ALIGNMENT FORUM
The Theoretical Foundations of Reward Learning
AF

The Theoretical Foundations of Reward Learning

Feb 28, 2025 by Joar Skalse

In this sequence I provide an overview of the theoretical reward learning research agenda, including its motivating assumptions, several core results, and some starting points for how to contribute to it further.

15The Theoretical Reward Learning Research Agenda: Introduction and Motivation
Joar Skalse
3mo
4
9Partial Identifiability in Reward Learning
Joar Skalse
3mo
0
10Misspecification in Inverse Reinforcement Learning
Joar Skalse
3mo
0
6STARC: A General Framework For Quantifying Differences Between Reward Functions
Joar Skalse
3mo
0
4Misspecification in Inverse Reinforcement Learning - Part II
Joar Skalse
3mo
0
9Defining and Characterising Reward Hacking
Joar Skalse
3mo
0
10Other Papers About the Theory of Reward Learning
Joar Skalse
3mo
0
10How to Contribute to Theoretical Reward Learning Research
Joar Skalse
3mo
0