This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Debate (AI safety technique)
•
Applied to
Deception Chess: Game #2
by
RobertM
5d
ago
•
Applied to
AI debate: test yourself against chess 'AIs'
by
Richard Willis
13d
ago
•
Applied to
Debate helps supervise human experts [Paper]
by
Roger Dearnaley
18d
ago
•
Applied to
AI Safety 101 - Chapter 5.1 - Debate
by
Charbel-Raphael Segerie
1mo
ago
•
Applied to
Evaluating Superhuman Models with Consistency Checks
by
Daniel Paleka
4mo
ago
•
Applied to
A Proposal for AI Alignment: Using Directly Opposing Models
by
Arne B
7mo
ago
•
Applied to
Empathy bandaid for immediate AI catastrophe
by
installgentoo
8mo
ago
•
Applied to
[New LW Feature] "Debates"
by
Jim Babcock
8mo
ago
•
Applied to
Why I’m not working on {debate, RRM, ELK, natural abstractions}
by
Steve Byrnes
10mo
ago
•
Applied to
Alignment with argument-networks and assessment-predictions
by
Tor Økland Barstad
1y
ago
•
Applied to
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
by
Ruben Bloom
1y
ago
•
Applied to
Notes on OpenAI’s alignment plan
by
RobertM
1y
ago
•
Applied to
Questions about Value Lock-in, Paternalism, and Empowerment
by
Sam F. Brown
1y
ago
•
Applied to
AI Unsafety via Non-Zero-Sum Debate
by
Noosphere89
1y
ago