Learning biases and rewards simultaneously — AI Alignment Forum