This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Redwood Research
•
Applied to
Benchmarks for Detecting Measurement Tampering [Redwood Research]
by
Magdalena Wache
1y
ago
•
Applied to
LLMs are (mostly) not helped by filler tokens
by
Kshitij Sachan
1y
ago
•
Applied to
Critiques of prominent AI safety labs: Redwood Research
by
Anonymous Omega
1y
ago
•
Applied to
[Linkpost] Critiques of Redwood Research
by
Akash
1y
ago
•
Applied to
Some common confusion about induction heads
by
Ruben Bloom
1y
ago
•
Applied to
Practical Pitfalls of Causal Scrubbing
by
Jérémy Scheurer
2y
ago
•
Applied to
Causal scrubbing: Appendix
by
Jenny Nitishinskaya
2y
ago
•
Applied to
Causal scrubbing: results on induction heads
by
Jenny Nitishinskaya
2y
ago
•
Applied to
Causal scrubbing: results on a paren balance checker
by
Jenny Nitishinskaya
2y
ago
•
Applied to
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
by
Jenny Nitishinskaya
2y
ago
•
Applied to
Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small
by
Alexandre Variengien
2y
ago
•
Applied to
Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley
by
Luna Rimar
2y
ago
•
Applied to
Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small
by
Multicore
2y
ago
•
Applied to
Takeaways from our robust injury classifier project [Redwood Research]
by
Ruben Bloom
2y
ago
•
Applied to
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
by
DanielFilan
2y
ago
•
Applied to
High-stakes alignment via adversarial training [Redwood Research report]
by
Multicore
2y
ago
•
Applied to
Redwood Research is hiring for several roles (Operations and Technical)
by
Jessica W
2y
ago
•
Applied to
Redwood's Technique-Focused Epistemic Strategy
by
Ruben Bloom
3y
ago
•
Applied to
Redwood Research is hiring for several roles
by
Multicore
3y
ago
•
Applied to
Redwood Research’s current project
by
Multicore
3y
ago