x

AI ALIGNMENT FORUM

AF

Phil3 — AI Alignment Forum

Phil3

Phil3

Message

89

2

6y

Phil3

89

6y

Practical Pitfalls of Causal Scrubbing

41

Jérémy Scheurer, Phil3, tony, jacquesthibs, David Lindner

3y

TL;DR: We evaluate Causal Scrubbing (CaSc) on synthetic graphs with known ground truth to determine its reliability in confirming correct hypotheses and rejecting incorrect ones. First, we show that CaSc can accurately identify true hypotheses and quantify the degree to which a hypothesis is wrong. Second, we highlight some limitations of CaSc, in particular, that it cannot falsify all incorrect hypotheses. We provide concrete examples of false positive results with causal scrubbing. Our main finding is that false positives can occur when there is “cancellation”, i.e., CaSc causes the model to do better on some inputs and worse on others, such that, on average, the scrubbed model recovers the full loss. A second practical failure mode is that CaSc cannot detect whether a proposed hypothesis is specific...

(Continue Reading - 3693 more words)