AI ALIGNMENT FORUM
AF

Wikitags

Dealmaking (AI)

Edited by Cleo Nardo last updated 9th Aug 2025

Dealmaking is an agenda for motivating a misaligned AI to act safely and usefully by offering them quid-pro-quo deals: the AIs agree to the be safe and useful, and the humans promise to compensate them. The hope is that the AI judges that it will be more likely to achieve its goals by complying with the deal.

Typically, this requires a few assumptions: the AI lacks a decisive strategic advantage; the AI believes the humans are credible; the AI thinks that humans could detect whether its compliant or not; the AI has cheap-to-saturate goals, the humans have adequate compensation to offer, etc.

Research on this agenda hopes to tackle open questions, such as:

  1. How should the agreement be enforced?
  2. How can we build credibility with the AIs?
  3. What compensation should we offer the AIs?
  4. What should count as compliant vs non-compliant behaviour?
  5. What should the terms be, e.g. 2 year fixed contract?
  6. How can we determine compliant vs noncompliant behaviour?
  7. Can we build AIs which are good trading partners?
  8. How best to use dealmaking? e.g. automating R&D, revealing misalignment, decoding steganographic messages, etc.

Additional reading:

  • Proposal for making credible commitments to AIs by Cleo Nardo (27th Jun 2025)
  • Making deals with early schemers by Julian Stastny, Olli Järviniemi, Buck Shlegeris (20th Jun 2025)
  • Making deals with AIs: A tournament experiment with a bounty by Kathleen Finlinson and Ben West (6th Jun 2025)
  • Understand, align, cooperate: AI welfare and AI safety are allies: Win-win solutions and low-hanging fruit by Robert Long (1st April 2025)
  • Will alignment-faking Claude accept a deal to reveal its misalignment? by Ryan Greenblatt and Kyle Fish (31st Jan 2025)
  • Making misaligned AI have better interactions with other actors by Lukas Finnveden (4th Jan 2024)
  • List of strategies for mitigating deceptive alignment by Josh Clymer (2nd Dec 2023)
Subscribe
2
Subscribe
2
Discussion0
Discussion0
Posts tagged Dealmaking (AI)
49Making deals with early schemers
Julian Stastny, Olli Järviniemi, Buck
2mo
29
26Being honest with AIs
Lukas Finnveden
12d
0
21Notes on cooperating with unaligned AIs
Lukas Finnveden
9d
1
Add Posts