This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Redwood Research
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Redwood Research
Random Tag
Contributors
Posts tagged
Redwood Research
Most Relevant
6
99
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
2y
29
0
59
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
,
Xander Davies
,
Buck Shlegeris
,
Nate Thomas
2y
5
2
57
Takeaways from our robust injury classifier project [Redwood Research]
dmz
2y
7
2
46
Benchmarks for Detecting Measurement Tampering [Redwood Research]
Ryan Greenblatt
,
Fabien Roger
1y
11
0
57
Redwood Research’s current project
Buck Shlegeris
3y
18
1
27
Redwood's Technique-Focused Epistemic Strategy
Adam Shimi
3y
1
2
10
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
2y
0
0
65
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
Lawrence Chan
,
Nate Thomas
2y
15
0
55
Why I'm excited about Redwood Research's current project
Paul Christiano
3y
3
0
48
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt
2y
4
1
40
Practical Pitfalls of Causal Scrubbing
Jérémy Scheurer
,
Phil3
,
tony
,
Jacques Thibodeau
,
David Lindner
2y
9
0
21
We're Redwood Research, we do applied alignment research, AMA
Nate Thomas
3y
2
1
22
Causal scrubbing: results on induction heads
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Tao Lin
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
2y
0
1
20
Causal scrubbing: results on a paren balance checker
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Tao Lin
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
2y
2
1
11
Causal scrubbing: Appendix
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
2y
1