MONA: Managed Myopia with Approval Feedback
Blog post by Sebastian Farquhar, David Lindner, Rohin Shah. It discusses the paper MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking by Sebastian Farquhar, Vikrant Varma, David Lindner, David Elson, Caleb Biddulph, Ian Goodfellow, and Rohin Shah. Our paper tries to make agents that are safer in...