x
Appendices: Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking — AI Alignment Forum