The Global Call for AI Red Lines was signed by 12 Nobel Prize winners, 10 former heads of state and ministers and over 300 prominent signatories. Launched at the UN General Assembly and presented to the UN Security Council. There is still much to be done, so we need to capitalize on this momentum. We are sharing this to solicit feedback and collaboration.

The mission of the agenda is to help catalyse international agreement to prevent unacceptable AI risks as soon as possible.

Here are the main research projects I think are important for moving this needle. We need all hands on deck:

Strategists: Designing the International Agreement
Policy Advocacy: Stakeholder Engagement
Technical Governance: Identifying Good Red Lines
Governance Analysis: Formalization and Legal Drafting
Technical Analysis: Standards harmonization and Operationalization
Domain Experts: Better Risk Modeling for Better Thresholds
Forecasters: Creating a Sense of Urgency

Policy Strategists: Designing the International Agreement

Type of international agreement: Determine the most effective avenues for securing an international agreement. For a given red line, should we prioritize:
- a UN Security Council resolution
- a UN General Assembly resolution,
- an agreement within the G7 (e.g., following the Hiroshima AI Process).
- or a bilateral agreement/coalition of the willing formed at an international summit? (optimizing for speed)
Historical analysis: Examine the primary historical pathways to international agreements. For instance, what were the necessary initial conditions for the Montreal Protocol? How did 195 countries with very different levels of implications in the matter successfully converge to ban ozone-depleting gases?
Specific Goal: How fast can red lines be negotiated? How quickly have regulatory red lines been established and enforced in prior cases involving emerging technology risks? What factors predicted rapid vs. slow adoption? Many ways to accelerate are listed here. *
Fallback Strategies: What are the alternative strategies if the international red lines plan fails?

* For example, time pressure is one of the common elements of fast international agreements. From credible threat to global treaty in under 2.5 years:

1974: Scientists first warn of ozone depletion from CFCs.
May 1985: The discovery of the Antarctic ozone hole creates a massive, undeniable sense of urgency.
Dec 1986: Formal negotiations begin.
Sept 1987: The Montreal Protocol is signed.

Policy Advocacy: Stakeholder Engagement

Finding our champions: Identify and engage key actors in AI governance who could champion an international agreement. Costa Rica and Spain organized the UN's Global Dialogue. The UK initiated the AISI and the AI Safety Summits. Who could be the champion now?
Coordination with the ecosystem: Develop a multi-stakeholder "all-stars paper" in which key partner institutions contribute a section outlining their roles in achieving an international agreement.
Existing International Commitments: It is essential to demonstrate to diplomats that we are not starting from scratch, as evident in this FAQ. This work would involve fleshing out those answers with what States have said in international fora, such as the UN and the OECD. An example of such work is this brief analysis of the last high-level UN Security Council session, which shows that nearly all members support regulating AI or establishing red lines.

Technical Governance: Identifying Good Red Lines

Red lines are one of the only ways to prevent us from boiling the frog. I don’t think there will necessarily be a phase transition between today’s GPT-5 and superintelligence. I don’t expect a big warning shot either. We may cross the point of no return without realizing it. Any useful red lines should be drawn before the point of no return.

Conducting a survey: Develop and conduct a survey to determine which specific red lines to prioritize and which mechanisms are most effective for implementation. This could take the form of a broad survey of civil society organizations or citizens' jury deliberations.
Paper analyzing pros & cons of different red lines: Analyzing the pros and cons of implementing red lines and assessing the potential costs of such implementation. Some red lines could be net negative, and we should probably focus only on those that pose severe international risks. [Footnote: Some red lines could be pretty net negative. When GPT-2 was published, OpenAI was scared of publishing it and releasing the weights. However, we now know that it would probably have been an error to try to ban GPT-3-level models. We don’t want to cry wolf too soon, (and at the same time, it was very reasonable with the data that we had at this point to be very cautious with the next version of GPT).]
Identifying specific red lines where geopolitical rivals can agree? What AI capabilities do potential adversary states (based on public statements, military doctrine, strategic analysis, etc) themselves fear or consider destabilizing? This would require writing a paper similar to this one but focused on red lines, In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate? This might be especially useful in helping US stakeholders support the Red Lines initiative.

Governance Analysis: Formalization and Legal Drafting

Developing model legislation and treaty language for red lines: What specific, legally-precise language could jurisdictions use to codify AI red lines international agreements and domestic law? This would provide a ready-to-adopt legislative and treaty text that policymakers can introduce immediately. For example, those draft treaty from MIRI, and this one from taisc.org are interesting first drafts and could be adapted. Such legislation should probably include mechanisms to update the technical operationalization of the red lines and some safety obligations/prohibitions as science evolves. For example, the Code of Practice and the guidelines issued by the AI Office is a mechanism that provides more flexibility than the AI Act.
Verification mechanisms for AI red lines: What specific technical and institutional verification methods could credibly demonstrate compliance with AI red lines without compromising proprietary information or security? Could the International Network of AISIs take a role in this? Joint testing exercises provide historical precedents.
Emergency response for AI red lines: What should key actors do if an AI red line is crossed? One baseline is outlined in the AI Code of Practice for models that are not compliant; however, we could also envision a range of progressive political measures, ranging from taxes to the severe protective actions outlined in Miri’s treaty or in Superintelligence strategy.

Technical Analysis: Standards harmonization and Operationalization

Many AI companies have already proposed their own red lines (see in the footnotes). The problem is that they are often weak and/or vague. Without industry standards with clear, measurable definitions, it's easy to 'move the goalposts'—reinterpreting the rule to allow for the next, more powerful model.

Harmonization and operationalization of industry thresholds: Analyze the public Responsible Scaling Policies (RSPs) of major labs (e.g., OpenAI, Anthropic, DeepMind) to propose a merged, harmonized framework for unacceptable risk thresholds. The goal is to be able to state that "all major labs already support this minimum red line." We began thinking about this in the AI Safety Frontier Lab Coordination Workshop, organized in October in New York.
Red Lines Tracker: How far are we from unacceptable risks from AI according to labs’ dangerous capability evaluations? Create a public-facing website with a set of indicators that signal when AI systems are approaching predefined, dangerous capability thresholds, thereby serving as an early warning system for policy action.

To be more concrete on what to harmonize, we could begin with the two following red lines:

OpenAI, in their preparedness framework, proposes the following red line: "The model is capable of recursively self-improving (i.e., fully automated AI R&D), defined as either (leading indicator) a superhuman research scientist agent OR (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months.- Until we have specified safeguards and security controls that would meet a Critical standard, halt further development."
Anthropic: AI R&D-5, similar, but slightly different: “The ability to cause dramatic acceleration in the rate of effective scaling. Specifically, this would be the case if we observed or projected an increase in the effective training compute of the world's most capable model that, over the course of a year, was equivalent to two years of the average rate of progress during the period of early 2018 to early 2024. We roughly estimate that the 2018-2024 average scaleup was around 35x per year, so this would imply an actual or projected one-year scaleup of 352 = ~1000x.”

Domain Experts: Better Risk Modeling for Better Thresholds

Improving risk modeling: This is necessary to demonstrate that the risk thresholds currently being considered by some labs are too high.
- In the context of civil aviation safety standards, regulators require that a catastrophic aircraft system failure be limited to a probability of about one in a billion flight hours. For x-risk, it would be neat to reach this standard at least.
- Another important reason better risk modeling is urgent and important is because the Code of Practice of the AI Act requires companies to use the SoTA risk modeling in their safety and security frameworks, and the SoTA is currently bad—more information in this blog post.
Proposing more grounded risk thresholds: This is also useful for the AI Act and the CoP “Measure 4.1 Systemic risk acceptance criteria and acceptance determination,” which requires companies to define unacceptable risk levels. Civil society and independent experts must intervene here, as companies will set inadequate thresholds by default. One approach could include:
- Proposing consensus thresholds through surveys and expert workshops, such as the one we're organizing at the next convening of the International Association for Safe and Ethical AI.
- Collaborating with frontier labs for harmonization;
- Critiquing existing thresholds (e.g., see this critique of OpenAI's risk thresholds)

One way to propose more founded risk thresholds would be to follow the methodology from Leonie Koessler's GovAI paper: Risk thresholds for frontier AI (in figure 1), which consists of 1) fixing acceptable risk thresholds and 2) enabling defining capability thresholds by leveraging the current SoTA in risk modeling.

Forecasters: Creating a Sense of Urgency

Predicting Red Line Breaches: Engage with forecasters to commission a report predicting when current AI models might breach different capability red lines, potentially creating a sense of urgency to catalyze the political process.

In the paper Evaluating Frontier Models for Dangerous Capabilities (DeepMind, 2024). They begin by operationalizing what it means to self-proliferate with a benchmark, and subsequently, they estimate when the AI will be able to self-proliferate. You can see that there is a considerable uncertainty, but most of the probability mass is before the end of this decade. This is, by default, what I would use to show why we need to act quickly.

Call to Action

The window for establishing effective, binding international governance for advanced AI is rapidly closing.

You can support us by donating here.

If you want to contribute to this agenda, you can complete this form or contact us here: contact@red-lines.ai

Work done at CeSIA.

AI ALIGNMENT FORUM
AF