AI ALIGNMENT FORUM
AF

1569
Wikitags

Redwood Research

Edited by Dakara last updated 30th Dec 2024

Redwood Research is a nonprofit organization focused on mitigating risks from advanced artificial intelligence.

The initial directions of their research agenda include:

  • AI control
  • Evaluations and demonstrations of risk from strategic deception
  • Consulting on risks from misalignment
Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Redwood Research
7
192Alignment Faking in Large Language Models
ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
10mo
24
7
110The case for ensuring that powerful AIs are controlled
ryan_greenblatt, Buck
2y
32
4
103Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck, Nate Thomas
3y
29
2
102AI Control: Improving Safety Despite Intentional Subversion
Buck, Fabien Roger, ryan_greenblatt, Kshitij Sachan
2y
5
2
57Takeaways from our robust injury classifier project [Redwood Research]
dmz
3y
7
2
51Benchmarks for Detecting Measurement Tampering [Redwood Research]
ryan_greenblatt, Fabien Roger
2y
11
0
57Redwood Research’s current project
Buck
4y
18
-2
59Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau, Xander Davies, Buck, Nate Thomas
3y
5
2
61Preventing Language Models from hiding their reasoning
Fabien Roger, ryan_greenblatt
2y
5
2
62Catching AIs red-handed
ryan_greenblatt, Buck
2y
8
1
27Redwood's Technique-Focused Epistemic Strategy
adamShimi
4y
1
2
10AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
3y
0
2
81Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt, Kyle Fish
10mo
5
2
86How will we update about scheming?
ryan_greenblatt
10mo
3
0
65High-stakes alignment via adversarial training [Redwood Research report]
dmz, LawrenceC, Nate Thomas
4y
15
Load More (15/44)
Add Posts