Agents detecting agents: counterfactual versus influence — AI Alignment Forum