AI ALIGNMENT FORUM
AF

16
Wikitags

Redwood Research

Edited by Dakara last updated 30th Dec 2024

Redwood Research is a nonprofit organization focused on mitigating risks from advanced artificial intelligence.

The initial directions of their research agenda include:

  • AI control
  • Evaluations and demonstrations of risk from strategic deception
  • Consulting on risks from misalignment
Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Redwood Research
192Alignment Faking in Large Language Models
ryan_greenblatt, evhub, Carson Denison, Benjamin Wright, Fabien Roger, Monte M, Sam Marks, Johannes Treutlein, Sam Bowman, Buck
8mo
24
110The case for ensuring that powerful AIs are controlled
ryan_greenblatt, Buck
2y
32
103Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck, Nate Thomas
3y
29
102AI Control: Improving Safety Despite Intentional Subversion
Buck, Fabien Roger, ryan_greenblatt, Kshitij Sachan
2y
5
57Takeaways from our robust injury classifier project [Redwood Research]
dmz
3y
7
51Benchmarks for Detecting Measurement Tampering [Redwood Research]
ryan_greenblatt, Fabien Roger
2y
11
57Redwood Research’s current project
Buck
4y
18
59Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
maxnadeau, Xander Davies, Buck, Nate Thomas
3y
5
61Preventing Language Models from hiding their reasoning
Fabien Roger, ryan_greenblatt
2y
5
62Catching AIs red-handed
ryan_greenblatt, Buck
2y
8
27Redwood's Technique-Focused Epistemic Strategy
adamShimi
4y
1
10AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
3y
0
81Will alignment-faking Claude accept a deal to reveal its misalignment?
ryan_greenblatt, Kyle Fish
8mo
5
86How will we update about scheming?
ryan_greenblatt
8mo
3
65High-stakes alignment via adversarial training [Redwood Research report]
dmz, LawrenceC, Nate Thomas
3y
15
Load More (15/44)
Add Posts