x

AI ALIGNMENT FORUM

AF

Marie_DB — AI Alignment Forum

Marie_DB

Marie_DB

Message

290

Ω

19

1

4

1y

Marie_DB

290

Ω

19

1y

An alignment safety case sketch based on debate

This post presents a mildly edited form of a new paper by UK AISI's alignment team (the abstract, introduction and related work section are replaced with an executive summary). Read the full paper here. Executive summary AI safety via debate is a promising method for solving part of the alignment...

May 8, 2025•62

UK AISI’s Alignment Team: Research Agenda

by Benjamin Hilton, Jacob Pfau, Marie_DB, and Geoffrey Irving

The UK’s AI Security Institute published its research agenda yesterday. This post gives more details about how the Alignment Team is thinking about our agenda. Summary: The AISI Alignment Team focuses on research relevant to reducing risks to safety and security from AI systems which are autonomously pursuing a course...

May 7, 2025•115