AI ALIGNMENT FORUM
AF

328
David Lindner
Ω16100
Message
Dialogue
Subscribe

Alignment researcher at Google DeepMind

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Practical Pitfalls of Causal Scrubbing
David Lindner3y20

Thanks, that's a useful alternative framing of CaSc!

FWIW, I think this adversarial version of CaSc would avoid the main examples in our post where CaSc fails to reject a false hypothesis. The common feature of our examples is "cancellation" which comes from looking at an average CaSc loss. If you only look at the loss of the worst experiment (so the maximum CaSc loss rather than the average one) you don't get these kind of cancellation problems.

Plausibly you'd run into different failure modes though, in particular, I guess the maximum measure is less smooth and gives you less information on "how wrong" your hypothesis is.

Reply
17MONA: Three Month Later - Updates and Steganography Without Optimization Pressure
6mo
0
44MONA: Managed Myopia with Approval Feedback
9mo
7
22On scalable oversight with weak LLMs judging strong LLMs
1y
18
11VLM-RM: Specifying Rewards with Natural Language
2y
0
40Practical Pitfalls of Causal Scrubbing
3y
9
36Threat Model Literature Review
3y
3
45Clarifying AI X-risk
3y
16