There have recently been various proposals for mitigations to "the intelligence curse" or "gradual disempowerment"—concerns that most humans would end up disempowered (or even dying) because their labor is no longer valuable. I'm currently skeptical that the typically highlighted prioritization and interventions are best and I have some alternative proposals for relatively targeted/differential interventions which I think would be more leveraged (as in, the payoff is higher relative to the difficulty of achieving them).

It's worth noting I doubt that these threats would result in huge casualty counts (due to e.g. starvation) or disempowerment of all humans (though substantial concentration of power among a smaller group of humans seems quite plausible).[1] I decided to put a bit of time into writing up my thoughts out of general cooperativeness (e.g., I would want someone in a symmetric position to do the same).

(This was a timeboxed effort of ~1.5 hr, so apologies if it is somewhat poorly articulated or otherwise bad. Correspondingly, this post is substantially lower effort than my typical post.)

My top 3 preferred interventions focused on these concerns are:

  • Mandatory interoperability for alignment and fine-tuning: Pass regulation or create a norm that requires AI companies to support all the APIs and interfaces needed to customize their models and (attempt to) align them differently. Either third parties would inspect the implementation (to avoid tampering and to ensure sufficient affordances) or perhaps more robustly, the companies would be required to submit their weights to various (secure) third parties that would implement the relevant APIs. Then, many actors could compete in offering differently fine-tuned models competing over the level of alignment (and the level of alignment to users in particular). This would be using relatively deep model access (not just prompting), e.g. full weight fine-tuning APIs that support arbitrary forward and backward, per-token losses, adding new heads/probes, and more generally whatever access is needed for alignment methods. (Things like (e.g.) steering vectors could be supported, but currently wouldn’t be important as they aren’t the state of the art for typical usage.)  The hope here would be to get the reductions in concentration of power that come from open source while simultaneously being more likely to be feasible given incentives (and not eating big security downsides). This proposal could allow AI companies to profit roughly as much while greatly increasing competition in some parts of the tech stack which are most relevant to disempowerment. David Bau and others are pursuing a similar direction in NDIF, though they are (initially?) more focused on providing interpretability model access rather than customization. Examples of this in other industries / cases includes: vertical unbundling or vertical separation in telecoms/utilities, the chromium back-end which supports many different front ends (some called "Chromes"), and mandatory interoperability proposals in social media. To prevent mass casualties (which are worse than the problem we're aiming to solve), you'd probably need some layer to prevent bioweapons (and similar) misuse, but you could try to ensure tighter restrictions on just this. If you're a bullet biting libertarian, you could just accept the mass fatalities. You'd need to prevent AI companies from having control: ideally, there would be some misuse standard companies comply with and then the AI companies would have to sell model access / fine-tuning as a commodity without terms of service that give them substantial control. (Or minimally, the terms of service and process for banning users would need to be transparent.)
  • Aligning AI representatives / advisors to individual humans: If every human had a competitive and aligned AI representative which gave them advice on how to advance their interests as well as just directly pursuing their interests based on their direction (and this happened early before people were disempowered), this would resolve most of these concerns. People could strategy steal effectively (e.g. investing their wealth competitively with other actors) and notice if something would result in them being disempowered (e.g., noticing when a company is no longer well controlled by its shareholders or when a democracy will no longer be controlled by the voters). Advisors should also point out cases where a person might change their preference on further reflection or when someone seems to be making a mistake on their own preferences. (Of course, if someone didn’t want this, they could also ask to not get this advice.) It's somewhat unclear if good AI advisors could happen before much of the disempowerment has already occurred, but it seems plausible to accelerate and to create pressure on AI companies to have AIs (which are supposed to be) aligned for this purpose. This is synergistic with the above bullet. As in the above bullet, you would need additional restrictions to prevent massive harm due to bioweapons and similar misuse.
  • Improving societal awareness: Generally improving societal awareness seems pretty helpful, so transparency, ongoing deployment of models, and capability demos all seem good. This is partially to push for the prior interventions and so that people negotiate (potentially as advised by AI advisors) while they still have power. On my mainline, takeoff is maybe too fast for this to look as good, but it seems particularly good if you share the takeoff speed views that tend to be discussed by people who worry most about intelligence curse / gradual disempowerment type concerns.

Some things which help with above:

  • Deploying models more frequently (ideally also through mandatory interoperability). (Companies could be pressured into this etc.) This increases awareness and reduces the capability gap between AI representatives and internally available AIs.

Implicit in my views is that the problem would be mostly resolved if people had aligned AI representatives which helped them wield their (current) power effectively.

To be clear, something like these interventions has been highlighted in prior work, but I have a somewhat different emphasis and prioritization and I'm explicitly deprioritizing other interventions.

Deprioritized interventions and why:

  • I'm skeptical of generally diffusing AI into the economy, working on systems for assisting humans, and generally uplifting human capabilities. This might help some with societal awareness, but doesn't seem like a particularly leveraged intervention for this. Things like emulated minds and highly advanced BCIs might help with misalignment, but otherwise seems worse than AI representatives (which aren't backdoored and don't have secret loyalties/biases).
  • Localized AI capabilities and open source seems very hard to make happen in a way which is competitive with big AI companies. And, forcing AI companies to open source throughout high levels of capability seems hard and also very dangerous on other concerns (like misalignment, bioweapons, and takeoff generally being faster than we can handle). It seems more leveraged to work on mandatory interoperability. I'm skeptical that local data is important. I agree that easy (and secure/robust) fine-tuning is helpful, but disagree this is useful for local actors having a (non-trivial) comparative advantage.
  • Generally I'm skeptical of any proposal which looks like "generally make humans+AI (rather than just AI) more competitive and better at making money/doing stuff". I discuss this some here. There are already huge incentives in the economy to work on this and the case for this helping is that it somewhat prolongs the period where humans are adding substantial economic value but there are also powerful AIs (at least that’s the hypothesis). I think general economic acceleration like this might not even prolong a relevant human+AI period (at least by much) because it also speeds up capabilities and adoption in the broader economy would likely be limited! Cursor isn't a positive development for these concerns IMO.
  • I agree that AI enabled contracts, AI enabled coordination, and AIs speeding up key government processes would be good (to preserve some version of rule of law such that hard power is less important). It seems tricky to advance this now.
  • Advocating for wealth redistribution and keeping democracy seems good, but probably less leveraged to work on now. It seems good to mention when discussing what should happen though.
  • Understanding agency, civilizational social processes, and how you could do “civilizational alignment” seems relatively hard and single-single aligned AI advisors/representatives could study these areas as needed (coordinating research funding across many people as needed).
  • I don't have a particular take on meta interventions like "think more about what would help with these risks and how these risks might manifest", I just wanted to focus on somewhat more object level proposals.

(I'm not discussing interventions targeting misalignment risk, biorisk, or power grab risk, as these aren't very specific to this threat model.)

Again, note that I'm not particularly recommending these interventions on my views about the most important risks, just claiming these are the best interventions if you're worried about "intelligence curse" / "gradual disempowerment" risks.

  1. ^

    That said, I do think that technical misalignment issues are pretty likely to disempower all humans and I think war, terrorism, or accidental release of homicidal bioweapons could kill many. That's why I focus on misalignment risks.

New Comment
4 comments, sorted by Click to highlight new comments since:

My response to the alignment / AI representatives proposals:

Even if AIs are "baseline aligned" to their creators, this doesn't automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, "You are messing up, please coordinate with other nations/groups, stop what you are doing" requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we've seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn't guarantee it will be heeded or lead to effective collective action. 

[...] "Politicians...will remain aware...able to change what the system is if it has obviously bad consequences." The climate change analogy is pertinent here. We have extensive scientific consensus, an "oracle IPCC report", detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are "obviously bad." The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.

Extract copy pasted from a longer comment here.

I don't see how these help.

First, it seems to me that interoperability + advisors would be useless for helping people sociopolitically maneuver precisely up until the point the AI models are good enough to disempower most of them. Imagine some not-particularly-smart person with no AI expertise and not much mental slack for fighting some abstract battle about the future of humanity. A demographic of those people is then up against the legal teams of major corporations and the departments of major governments navigating the transition. The primary full-time jobs of the people working in the latter groups, at which they'd be very skilled, would be figuring out how to disempower the former demographics. In what case do the at-risk demographics stand any chance?

Well, if the AI models are good enough that the skills and the attentions of the humans deploying them don't matter. Which is to say: past the point at which the at-risk demographics are already disempowered.

Conversely, if the AI capabilities are not yet there, then different people can use AIs more or less effectively depending on how smart and skilled they are, and how much resources/spare attention for the fight they have. In which case the extant powers are massively advantaged and on-expectation win.

Second, I'm skeptical about feasibility. This requires, basically, some centralized distribution system which (1) faithfully trains AI models to be ultimately loyal to their end advisees, (2) fully subsidizes the compute costs for serving these model to all the economically useless people (... in the US? ... in the world?). How is that centralized system not subverted/seized by the extant powers? (E. g., the government insisting, in a way that sounds surface-level reasonable, on ultimate loyalty to its laws first, which it can then freely rewrite to have complete control.)

Like, suppose this system is set up some time before AI models are good enough to make human workers obsolete. As per the first point, the whole timespan prior to the AI models becoming that good would involve the entrenched powers successfully scheming towards precisely the gradual disempowerment we're trying to prevent. How do we expect this UBI-like setup[1] to still be in place, uncorrupted, by the point AI models become good enough, given that it is the largest threat to the ability of extant powers to maintain/increase their power? This would be the thing everyone in power is trying to dismantle. (And, again, unless the AIs are already good enough to disempower the risk demographics, the risk demographics plus their AI representatives would be massively worse at fighting this battle than their opponents plus their AI representatives.)

Third: Okay, let's suppose the system is somehow in place, everyone in the world has a loyal AI representative advising them and advocating for their interests. What are these representatives supposed to do? Like, imagine some below-average-intelligence menial-labor worker hopelessly outclassed by robots. What moves is their AI representative supposed to make to preserve their power and agency? They don't really have any resources to bargain with; it's the ground truth.

Will the AIs organize them to march to the military bases/datacenters/politicians' and CEOs' homes and physically seize the crucial resources/take the key players hostage, or what?


There was a thought experiment going around on Twitter a few months back, which went:

Suppose people have to press one of two buttons, Blue and Red. If more than 50% of people press Blue, nobody dies. If more than 50% press Red, everyone who pressed Blue dies. Which button do you press?

Red is guaranteed to make you safe, and in theory, if everyone pressed Red, everyone would be safe. But if we imagine something isomorphic to this happening in reality, we should note that getting 100% of people to make the correct choice is incredibly hard. Some would be distracted, or confused, or they'd slip and fall and smack the wrong button accidentally.

To me, all variations on "let's give everyone in the world an AGI advisor to prevent disempowerment!" read as "let's all play Red!". The details don't quite match (50% survival/non-disempowerment rate is not at all guaranteed), but the vibe and the failure modes are the same.

  1. ^

    Which is also supposed to be implemented during Trump's term?

Note: as a general policy I'm not planning on engaging with the comments here, this is just because I don't want to spend a bunch of time on this topic and this could easily eat a bunch of time. Sorry about that.

Do states and corporations also have their aligned representatives? Is the cognitive power of the representatives equal, roughly equal, or wildly unequal? If it is unequal, why are the resulting equilibria pro-human? (i.e. if I imagine individual humans like me represented by eg GPT4 while the government runs tens of thousands o4s, I would expect my representative to get convinced about whatever government wants)

Curated and popular this week