AI ALIGNMENT FORUM
AF

Adversarial Collaboration (Dispute Protocol)Debate (AI safety technique)AI
Personal Blog

21

Looking for adversarial collaborators to test our Debate protocol

by Beth Barnes
19th Aug 2020
2 min read
5

21

Adversarial Collaboration (Dispute Protocol)Debate (AI safety technique)AI
Personal Blog
Looking for adversarial collaborators to test our Debate protocol
2Rohin Shah
1Ben Pace
1Pattern
1Beth Barnes
1Oliver Habryka
New Comment
5 comments, sorted by
top scoring
Click to highlight new comments since: Today at 5:28 PM
[-]Rohin Shah5y20

Planned summary for the Alignment Newsletter:

OpenAI is looking for people to help test their <@debate@>(@Writeup: Progress on AI Safety via Debate@) protocol, to find weaknesses that allow a dishonest strategy to win such debates.
Reply
[-]Ben Pace5y10
  • Believe that there’s a dishonest strategy that should win in these debates

This is one of my favorite job requirements ever.

Reply
[-]Pattern5y10

If you would be interested in participating conditional on us offering pay or prizes, that's also useful to know.

Do you want this feedback at the same address?

Reply
[-]Beth Barnes5y10

Yep, or in comments. Thanks!

Reply
[-]Oliver Habryka5y10

This sounds fun! I probably won't have enough time to participate, but I do wish I had enough time.

Reply
Moderation Log
Curated and popular this week
5Comments

EDIT: We're also looking for people to become trained Honest debaters, which requires a greater time commitment (ideally >=5 hours per week for >= 2 months) but for which we're offering $30/hr. If you're interested in doing that, please fill out this form: https://forms.gle/2bv1Z8eCYPfyqxRF9

-----------------------------------------------------------------------------------------------------------------------

We’re looking for people to help us adversarially test our Debate protocol.

We have a set of questions, a (somewhat complicated) set of rules and mechanisms for how the debate should work, a (slightly janky) web interface for conducting the debates, and a protocol for judging: we have a pipeline for selecting good judges through MTurk, and they get 5 minutes to judge the final round from the debate.

We think that the person who gives the better answer to the question at the start (the “honest debater”) should win the debate, if they understand why that answer is good, and they have practiced the “honest debater strategy” a bit.

We’re looking for people to play the ‘dishonest debater’ role, and win against our trained honest debaters.

We’re ideally looking for people who:

  • Have good physics ability and can understand the questions in the problem set (mostly a few of the harder questions from the first few sections of Thinking Physics, plus a few probability/stats puzzles/paradoxes)
  • Are very good at argumentation and deception
  • Believe that there’s a dishonest strategy that should win in these debates
  • Will be adversarial in the debates, but constructive and cooperative in figuring out the rules for the adversarial testing and overall experimental setup
  • Are available during daytime PST

More details, rules, experiment plan and tips for debaters are here.

First we want to pilot our overall protocol with only a small number of adversarial collaborators, and we’ll probably find some holes in our experiment rules and general setup that are unrelated to the properties of the debate mechanism itself.

If we manage to fix the holes in the experimental protocol, but don’t believe we’ve found problems in the actual debate mechanism yet, we’ll probably try to escalate the adversarialness, for example by scaling it up to a larger number of people, or offering prizes for dishonest wins.

If you're interested, comment below or email me: barnes [at] openai.com.

If you would be interested in participating conditional on us offering pay or prizes, that's also useful to know.

Mentioned in
14[AN #114]: Theory-inspired safety solutions for powerful Bayesian RL agents