Counterfactual oversight vs. training data — AI Alignment Forum