This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Redwood Research
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Redwood Research
Random Tag
Contributors
Posts tagged
Redwood Research
Most Relevant
6
98
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
1y
24
0
59
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau
,
Xander Davies
,
Buck Shlegeris
,
Nate Thomas
1y
5
2
55
Takeaways from our robust injury classifier project [Redwood Research]
dmz
1y
6
2
45
Benchmarks for Detecting Measurement Tampering [Redwood Research]
Ryan Greenblatt
,
Fabien Roger
3mo
6
0
57
Redwood Research’s current project
Buck Shlegeris
2y
18
1
27
Redwood's Technique-Focused Epistemic Strategy
Adam Shimi
2y
1
2
10
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
1y
0
0
65
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
Lawrence Chan
,
Nate Thomas
2y
15
0
55
Why I'm excited about Redwood Research's current project
Paul Christiano
2y
3
0
47
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
KevinRoWang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt
1y
2
1
40
Practical Pitfalls of Causal Scrubbing
Jérémy Scheurer
,
Phil3
,
tony
,
Jacques Thibodeau
,
David Lindner
8mo
9
0
21
We're Redwood Research, we do applied alignment research, AMA
Nate Thomas
2y
2
1
20
Causal scrubbing: results on a paren balance checker
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Tao Lin
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
1y
2
1
22
Causal scrubbing: results on induction heads
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Tao Lin
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
1y
0
1
11
Causal scrubbing: Appendix
Lawrence Chan
,
Adrià Garriga-Alonso
,
Nicholas Goldowsky-Dill
,
Ryan Greenblatt
,
Jenny Nitishinskaya
,
Ansh Radhakrishnan
,
Buck Shlegeris
,
Nate Thomas
1y
1