This is the eighth (and, for now, final) post in the theoretical reward learning sequence, which starts in this post. Here, I will provide a few pointers to anyone who might be interested in contributing to further work on this research agenda, in the form of a few concrete and...
This is the seventh post in the theoretical reward learning sequence, which starts in this post. Here, I will provide shorter summaries of a few additional papers on the theory of reward learning, but without going into as much depth as I did in the previous posts (but if there...
In this post, I will provide a summary of the paper Defining and Characterising Reward Hacking, and explain some of its results. I will assume basic familiarity with reinforcement learning. This is the sixth post in the theoretical reward learning sequence, which starts in this post (though this post is...
In this post, I will provide a summary of the paper Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification, and explain some of its results. I will assume basic familiarity with reinforcement learning. This is the fifth post in the theoretical reward learning sequence, which starts in this post....
In this post, I will provide a summary of the paper STARC: A General Framework For Quantifying Differences Between Reward Functions, and explain some of its results. I will assume basic familiarity with reinforcement learning. This is the fourth post in the theoretical reward learning sequence, which starts in this...
In this post, I will provide a summary of the paper Misspecification in Inverse Reinforcement Learning, and explain some of its results. I will assume basic familiarity with reinforcement learning. This is the third post in the theoretical reward learning sequence, which starts in this post (though this post is...
In this post, I will provide a summary of the paper Invariance in Policy Optimisation and Partial Identifiability in Reward Learning, and explain some of its results. I will assume basic familiarity with reinforcement learning. This is the second post in the theoretical reward learning sequence, which starts in this...