Dealmaking is an agenda for motivating a misaligned AI to act safely and usefully by offering them quid-pro-quo deals: the AIs agree to the be safe and useful, and the humans promise to compensate them. The hope is that the AI judges that it will be more likely to achieve its goals by complying with the deal.
Typically, this requires a few assumptions: the AI lacks a decisive strategic advantage; the AI believes the humans are credible; the AI thinks that humans could detect whether its compliant or not; the AI has cheap-to-saturate goals, the humans have adequate compensation to offer, etc.
Research on this agenda hopes to tackle open questions, such as:
Additional reading: