Counterfactual oversight vs. training data - History — AI Alignment Forum