AI ALIGNMENT FORUM
AF

2669
Wikitags

Debate (AI safety technique)

Edited by Bird Concept, plex last updated 15th Jul 2022

Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers.[1] The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high stakes, but are not well-defined (such as advancing science or increase a company's revenue). [2][3]

  1. ^

    https://www.lesswrong.com/posts/Br4xDbYu4Frwrb64a/writeup-progress-on-ai-safety-via-debate-1

  2. ^

    https://ought.org/mission

  3. ^

    https://openai.com/blog/debate/

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Debate (AI safety technique)
3
50Writeup: Progress on AI Safety via Debate
Beth Barnes, paulfchristiano
6y
15
3
28A guide to Iterated Amplification & Debate
Rafael Harth
5y
0
3
64Debate update: Obfuscated arguments problem
Beth Barnes
5y
15
2
35An alignment safety case sketch based on debate
Marie_DB, Jacob Pfau, Benjamin Hilton, Geoffrey Irving
6mo
15
2
24How should AI debate be judged?
Q
abramdemski, paulfchristiano
5y
Q
26
2
11AI Safety via Debate
ESRogs
8y
10
2
20Optimal play in human-judged Debate usually won't answer your question
Joe Collman
5y
8
1
72An overview of 11 proposals for building safe advanced AI
evhub
5y
32
1
38My Overview of the AI Alignment Landscape: A Bird's Eye View
Neel Nanda
4y
4
2
58Imitative Generalisation (AKA 'Learning the Prior')
Beth Barnes
5y
13
1
34Why I'm excited about Debate
Richard_Ngo
5y
12
2
32Why I’m not working on {debate, RRM, ELK, natural abstractions}
Steven Byrnes
3y
16
1
23Three mental images from thinking about AGI debate & corrigibility
Steven Byrnes
5y
32
1
21Looking for adversarial collaborators to test our Debate protocol
Beth Barnes
5y
5
2
25Reward button alignment
Steven Byrnes
6mo
10
Load More (15/49)
Add Posts