Pranav Gade

Posts

Sorted by New

52unRLHF - Efficiently undoing LLM safeguards

1y

2

Wiki Contributions

Comments

Sorted by

Newest

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

Pranav Gade2y10

I ended up throwing this(https://github.com/pranavgade20/causal-verifier) together over the weekend - it's probably very limited compared to redwood's thing, but seems to work on the one example I've tried.

Reply