Since the end of (very weak) training scaling laws
Precisely because the scaling laws are somewhat weak, there was nothing so far to indicate they are ending (the only sense in which they might be ending is running out of text data, but models trained on 2024 compute should still have more than enough). The scaling laws held for many orders of magnitude, they are going to hold for a bit further. It's plausibly not enough, even with something to serve the role of continual learning (beyond in-context learning on ever larger contexts). But there is still another 100x-400x in compute to go, compared to the best models deployed today. Likely the 100x-400x models will be trained in 2029-2031, at which point the pre-AGI funding for training systems mostly plateaus. This is (a bit more than) a full step of GPT-2 to GPT-3, or GPT-3 to original Mar 2023 GPT-4 (after original Mar 2023 GPT-4 and with the exception of GPT-4.5, OpenAI's naming convention no longer tracks pretraining compute). And we still didn't see such a full step compared to original Mar 2023 GPT-4, only half of a step (10x-25x), out of the total of 3-4 halves-of-a-step (2022-2030 training compute ramp, 2000x-10,000x in total, at higher end if BF16 to NVFP4 transition is included, at lower end if even in 2030 there are no 5 GW training systems and somehow BF16 needs to be used for the largest models).
Since original Mar 2023 GPT-4, models that were allowed to get notably larger and made full use of the other contemporary techniques only appeared in late 2025 (likely Gemini 3 Pro and Opus 4.5). These models are probably sized compute optimally for 2024 levels of pretraining compute (as in 100K H100s, 10x-25x the FLOPs of original Mar 2023 GPT-4), might have been pretrained with that amount of compute or a bit more, plus pretraining scale RLVR. All the other models we've seen so far are either smaller than compute optimal for even 2024 levels of pretrained compute (Gemini 2.5 Pro, Grok 4, especially GPT-5), or didn't get the full benefit of RLVR compared to pretraining (Opus 4.0, GPT-4.5) and so in some ways looked underwhelming compared to the other (smaller) models that were more comprehensively trained.
The buildout of GB200/GB300 NVL72 will be complete at flagship model scale in 2026, and makes it possible to easily serve models sized compute optimally for 2024 levels of compute (MoE models with many trillions of total params). More training compute is currently available and will be available in 2026 than what was there in 2024, but for most of the inference hardware currently available it won't be efficient to serve models sized compute optimally for this compute (at tens of trillions of total params), except with Ironwood TPUs (which are being built in 2026, for Google and Anthropic) and then Nvidia Rubin Ultra NVL576 (which will only get built in sufficient amounts in 2029, maybe late 2028).
So the next step of scaling will probably come in late 2026 to early 2027 from Google and Anthropic (while OpenAI will only be catching up to late 2025 models from Google and Anthropic, though of course in 2026 they'll have better methods than Google and Anthropic had in 2025). And then training compute will still continue increasing somewhat quickly for models until 2029-2031 (with 5 GW training systems, which is at least $50bn per year in training compute, or $100bn per year in total for each AI company if inference is consuming half of the budget). After Rubin Ultra NVL576 (in 2029) and to some extent even Ironwood (in 2026), inference hardware will no longer be a notable constraint on scaling, and after AI companies are working with 10 GW of compute (half for training, half for inference), pretraining compute will no longer be growing much faster than price-performance of hardware, which is much slower than the buildout trend of 2022-2026, and even than the likely ramp-off in 2026-2030. I only expect 2 GW training systems in 2028, rather than the 5 GW that the 2022-2026 trend would ask for in 2028. But by 2030 the combination of continuing buildout and somewhat better hardware should still reach the levels of what would be on-trend for 2028, following 2022-2026.
Strange attitude towards the physical world can be reframed as caring only about some abstract world that happens to resemble the physical world in some ways. A chess AI could be said to be acting on some specific physical chessboard within the real world and carefully avoiding all concern about everything else, but it's more naturally described as acting on just the abstract chessboard, nothing else. I think values/preference (for some arbitrary agent) should be not just about probutility upon the physical world, but should also specify which world they are talking about, so that different agents are not just normatively disagreeing about relative value of events, but about which worlds are worth caring about (not just possible worlds within some space of nearby possible worlds, but fundamentally very different abstract worlds), and therefore what kinds of events (from which sample spaces) ought to serve as semantics for possible actions, before their value can be considered.
A world model (such as an LLM with frozen weights) is already an abstraction, its data is not the same as the physical world itself, but it's coordinated with the physical world to some extent, similarly to how an abstract chessboard is coordinated with a specific physical chessboard in the real world (learning is coordination, adjusting the model so that the model and the world have more shared explanations for their details). Thus acting within an abstract world given by a world model (as opposed to within the physical world itself) might be a useful framing for systematically ignoring some aspects of the physical world, and world models could be intentionally crafted to emphasize particular aspects.
I would term "hope for " rather than "reliability", because it's about willingness to enact in response to belief in , but if is no good, you shouldn't do that. Indeed, for bad , having the property of is harmful fatalism, following along with destiny rather than choosing it. In those cases, you might want to or something, though that only prevents from being believed, that you won't need to face in actuality, it doesn't prevent the actual . So reflects a value judgement about reflected in agent's policy, something downstream of endorsement of , a law of how the content of the world behaves according to an embedded agent's will.
Payor's Lemma then talks about belief in hope , that is hope itself is exogenous and needs to be judged (endorsed or not). Which is reasonable for games, since what the coalition might hope for is not anyone's individual choice, the details of this hope couldn't have been hardcoded in any agent a priori and need to be negotiated during a decision that forms the coalition. A functional coalition should be willing to act on its own hope (which is again something we need to check for a new coalition, that might've already been the case for a singular agent), that is we need to check that is sufficient to motivate the coalition to actually . This is again a value judgement about whether this coalition's tentative aspirations, being a vehicle for hope that , are actually endorsed by it.
Thus I'd term "coordination" rather than "trust", the fact that this particular coalition would tentatively intend to coordinate on a hope for . Hope is a value judgement about , and in this case it's the coalition's hope, rather any one agent's hope, and the coalition is a temporary nascent agency thing that doesn't necessarily know what it wants yet. The coalition asks: "If we find ourselves hoping for together, will we act on it?" So we start with coordination about hope, seeing if this particular hope wants to settle as the coalition's actual values, and judging if it should by enacting if at least coordination on this particular hope is reached, which should happen only if is a good thing.
(One intuition pump with some limitations outside the provability formalism is treating as "probably ", perhaps according to what some prediction market tells you. If "probably " is enough to prompt you to enact , that's some kind of endorsement, and it's a push towards increasing the equilibrium-on-reflection value of probability of , pushing "probably " closer to reality. But if is terrible, then enacting it in response to its high probability is following along with self-fulfilling doom, rather doing what you can to push the equilibrium away from it.)
Löb's Theorem then says that if we merely endorse a belief by enacting the believed outcome, this is sufficient for the outcome to actually happen, a priori and without that belief yet being in evidence. And Payor's Lemma says that if we merely endorse a coalition's coordinated hope by enacting the hoped-for outcome, this is sufficient for the outcome to actually happen, a priori and without the coordination around that hope yet being in evidence. The use of Löb's Theorem or Payor's Lemma is that the condition (belief in , or coordination around hope for ) should help in making the endorsement, that is it should be easier to decide to if you already believe that , or if you already believe that your coalition is hoping for . For coordination, this is important because every agent can only unilaterally enact its own part in the joint policy, so it does need some kind of premise about the coalition's nature (in this case, about the coalition's tentative hope for what it aims to achieve) in order to endorse playing its part in the coalition's joint policy. It's easier to decide to sign an assurance contract than to unconditionally donate to a project, and the role of Payor's Lemma is to say that if everyone does sign the assurance contract, then the project will in fact get funded sufficiently.
1. Plan A is to race to build a Friendly AI before someone builds an unFriendly AI.
[...] Eliezer himself is now trying hard to change 1
This is not a recent development, as a pivotal act AI is not a Friendly AI (which would be too difficult), but rather things like a lasting AI ban/pause enforcement AI that doesn't kill everyone, or a human uploading AI that does nothing else, which is where you presumably need decision theory, but not ethics, metaethics, or much of broader philosophy.
The "ten people on the inside" direct AIs to useful projects within their resource allocation. The AGIs themselves direct their own projects according to their propensities, which might be influenced by publicly available Internet text, possibly to a greater extent if it's old enough to be part of pretraining datasets.
The amount of resources that AGIs direct on their own initiative might dwarf the amount of resources of the "ten people on the inside", so the impact of openly published technical plans (that make sense on their own merits) might be significant. While AGIs could come up with any ideas independently on their own, path dependence of the acute risk period might still make their initial propensities to pay attention to particular plans matter.
You can't really have a technical "Plan E" because there is approximately no one to implement the plan
AGIs themselves will be implementing some sort of plan (perhaps at very vague and disorganized prompting from humans, or without any prompting at all; which might be influenced by blog posts and such, in publicly available Internet text). This could be relevant for mitigating ASI misalignment if these AGIs are sufficiently aligned to the future of humanity, more so than some of the hypothetical future ASIs (created without following such a plan).
What happens with gradual disempowerment in this picture? Even Plan A seems compatible with handing off increasing levels of influence to AIs. One benefit of "shut it all down" (AGI Pause) is ruling out this problem by not having AGIs around (at least while the Pause lasts, which is also when the exit strategy needs to be prepared, not merely technical alignment).
Gradual disempowerment risks transitioning into permanent disempowerment (if not extinction), where a successful solution to technical ASI-grade alignment by the AIs might result in the future of humanity surviving, but only getting a tiny sliver of resources compared to the AIs, with no way of ever changing that even on cosmic timescales. Permanent disempowerment doesn't even need to involve a takeover.
Also, in the absence of "shut it all down", at some point targeting misalignment risks might be less impactful on the margin than targeting improvements in education (about AI risks and cruxes of mitigation strategies), coordination technologies, and AI Control. These enable directing more resources to misalignment risk mitigation as appropriate, including getting back to "shut it all down", a more robust ASI Pause, or making creation of increasingly capable AGIs non-lethal if misaligned (not a "first critical try").
My point is that the 10-30x AIs might be able to be more effective at coordination around AI risk than humans alone, in particular more effective than currently seems feasible in the relevant timeframe (when not taking into account the use of those 10-30x AIs). Saying "labs" doesn't make this distinction explicit.
with 10-30x AIs, solving alignment takes like 1-3 years of work ... so a crucial factor is US government buy-in for nonproliferation
Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing. Directing AIs to pursue such projects could be a priority comparable to direct alignment work. Unclear how visibly asymmetric such interventions will prove to be, but then alignment vs. capabilities work might be in a similar situation.
I'm responding to the claim that training scaling laws "have ended", even as the question of "the bubble" might be relevant context. The claim isn't very specific, and useful ways of making it specific seem to make it false, either in itself or in the implication that the observations so far have something to say in support of the claim.
The scaling laws don't depend on how much compute we'll be throwing at training or when, they predict how perplexity depends on the amount of compute. For scaling laws in this sense to become false, we'd need to show that perplexity starts depending on compute in some different way (with more compute). Not having enough compute doesn't disprove that the scaling laws are OK. Even not having enough data doesn't disprove this.
For practical purposes, scaling laws could be said to fail once they can no longer be exploited for making models better. As I outlined, there's going to be significantly more compute soon (this is still the case with "a bubble", which might have the power to get compute as much as 3x lower than the more optimistic 200x-400x projection for models by 2031, compared to the currently deployed models). The text data is plausibly in some trouble even for training with 2026 compute, and likely in a lot of trouble for training with 2028-2030 compute. But this hasn't happened yet, so the claim of scaling laws "having ended", past tense, would still be false in this sense. Instead, there would be a prediction that the scaling laws would in some practical sense end in a few years, before compute stops scaling even at pre-AGI funding levels. But also, the data efficiency I'm using for predicting that text data will be insufficient (even with repetition) is a product of the public pre-LLM-secrecy research that almost always took unlimited data for granted, so it's possible that spending a few years explicitly searching for ways to overcome data scarcity will let AI companies find a way to sidestep this issue, at least until 2030. Thus I wouldn't even predict that text data will run out by 2030 with a high degree of certainty, it's merely my baseline expectation.
I said nothing about qualitative improvements. Sufficiently good inference hardware makes it cheap to make models a lot bigger, so if there is some visible benefit at all, this will be happening at the pace of the buildouts of better inference hardware. But also conversely, if there's not enough inference hardware, you physically can't serve something as a frontier model (for a large user base) even if that offers qualitative improvements, unless you restrict demand (with very high prices or rate limits).
This is not very specific, similarly to the claim about training scaling laws "having ended". Even with "a bubble" (that bursts before 2031), some AI companies (like Google) might survive in an OK shape. These companies will also have their pick of the wreckage of the other AI companies, including both researchers and the almost-ready datacenter sites, which they can use to make their own efforts stronger. The range of scenarios I outlined only needs 2-4 GW of training compute by 2030 for at least one AI company (in addition to 2-4 GW of inference compute), which revenues of $40-80bn should be sufficient to cover (especially as the quality of inference hardware stops being a bottleneck, so that even older hardware will again become useful for serving current frontier models). Google has been spending this kind of money on datacenter capex as a matter of course for many years now.
OpenAI is projecting about $20bn of revenue in their current state, when the 800M+ free users are not being monetized (which is likely to change). These numbers can plausibly grow to at least give $50bn per year to the leading model company by 2030 (even if it's not OpenAI), this seems like a very conservative estimate. It doesn't depend on qualitative improvement in LLMs or promises for more than a trillion dollars in datacenter capex. Also, the capex numbers might even scale down gracefully if $50bn per year from one company by 2030 turns out to be all that's actually available.