Redwood Research

Edited by Dakara last updated 30th Dec 2024

Redwood Research is a nonprofit organization focused on mitigating risks from advanced artificial intelligence.

The initial directions of their research agenda include:

AI control
Evaluations and demonstrations of risk from strategic deception
Consulting on risks from misalignment

Posts tagged Redwood Research

7

192Alignment Faking in Large Language Models

ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck

10mo

24

7

110The case for ensuring that powerful AIs are controlled

ryan_greenblatt, Buck

2y

32

4

103Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck, Nate Thomas

3y

29

2

102AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt, Kshitij Sachan

2y

5

2

57Takeaways from our robust injury classifier project [Redwood Research]

dmz

3y

7

2

51Benchmarks for Detecting Measurement Tampering [Redwood Research]

ryan_greenblatt, Fabien Roger

2y

11

0

57Redwood Research’s current project

Buck

4y

18

-2

59Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck, Nate Thomas

3y

5

2

61Preventing Language Models from hiding their reasoning

Fabien Roger, ryan_greenblatt

2y

5

2

62Catching AIs red-handed

ryan_greenblatt, Buck

2y

8

1

27Redwood's Technique-Focused Epistemic Strategy

adamShimi

4y

1

2

10AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan

3y

0

2

81Will alignment-faking Claude accept a deal to reveal its misalignment?

ryan_greenblatt, Kyle Fish

10mo

5

2

86How will we update about scheming?

ryan_greenblatt

10mo

3

0

65High-stakes alignment via adversarial training [Redwood Research report]

dmz, LawrenceC, Nate Thomas

4y

15

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Redwood Research