Rohin Shah argues that many common arguments for AI risk (about the perils of powerful expected utility maximizers) are actually arguments about goal-directed behavior or explicit reward maximization, which are not actually implied by coherence arguments. An AI system could be an expected utility maximizer without being goal-directed or an explicit reward maximizer.
I don’t think that the updateful EDT behaviour in e.g. the calculator example is obviously problematic. Certainly not clearly worse than the alternative of just optimizing relative to the prior (cf. Anthony's post).
I do think that the buy-and-copy behaviour from your example is bad, but it is bad because of how EDT manages the news, not because of the combination of EDT and anthropic updating per se. A counterfactual theory like FDT or TDT doesn’t manage the news and so doesn’t use the buy-and-copy strategy, AIUI. (Maybe analogous cases could be const
ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development.
We're working to make this happen through what we believe is the most natural and promising approach: helping decision-makers in governments and the public understand the risks and take action.
We believe that ControlAI can achieve an international prohibition on ASI development if scaled sufficiently. We estimate that it would take approximately a $50 million yearly budget in funding to give us a concrete chance at achieving this in the next few years.
In this post, we lay out some of the reasoning behind this estimate, and explain how additional funding past that threshold, including and beyond $500 million, would continue...
Connor Leahy, what do you think about moving to America?
In 2018, Zhang et al. showed that deep neural networks can achieve perfect training loss on randomly labeled data.
This was a Big Deal.
It meant that existing generalization theory couldn't explain why deep neural networks generalize. That's because classical approaches to proving that a given model class (=neural network architecture) would generalize involved showing that it lacks the expressivity to fit noise. If a model class can fit noise arbitrarily well, the resulting bounds break.
So something needed to change.
Evidently, you can't prove tight generalization bounds for entire model classes, so theorists turned to studying generalization bounds for individual models within a model class. If you can empirically show that a model's performance doesn't change substantially when you perturb it (by adding noise to the inputs, weights, training samples, etc.), then you can theoretically prove that that model...
Coming back years later to say: People in 2016 (when the Zhang et al paper was first released) did already know that neural networks were expressive (the work demonstrating neural networks with very high VC dimension occurred in the late 90s and early 2000s).
The hope at the time was not that neural networks themselves lack representativity, but that some combination of neural networks + SGD or neural networks + weight decay or something that people were doing on top of neural networks induced a strong prior against being able to fit random data points. The...
My colleague Manish did a lot more analysis here. The main takeaway so far is categorizing each PR's improvements as "deep" vs "shallow", as well as "imported-from-literature" vs "invented".
It looks like there were large, shallow improvements imported from the literature early on, while since then most improvements have been moderately involved and a larger portion are novel.


To get more evidence about SIE likelihood, we have lots of work in the pipeline, including interviews with nanogpt contributors, 1B+ token runs using Opus 4.7 and GPT-5.5 on our Inspec...
Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions). [2] I disagree.
Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more difficult/larger tasks, tasks that aren't straightforward SWE tasks, and tasks that aren't...
If you would have predicted 15% for Agent-2, what would you have predicted for Agent-1 and Agent-0 levels? Presumably less than 15%?
In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as a scenario forecast, but for the present (which is already uncertain!) rather than the future. I will generally state my best guess without argumentation and without explaining my level of confidence: some of these claims are highly speculative while others are better grounded, certainly some will be wrong. I tried to make it clear which claims are relatively speculative by saying something like "I guess", "I expect", etc. (but I may have missed some).
You can think of this post as more like a list of my current views rather than a structured post with a thesis, but I think it...
I now expect ~3.5 hour 80% reliability time horizon (on METR benchmark) rather than ~2.5 hour based on this extrapolation. I did a quick and dirty extrapolation using the gap from Opus 4 to Opus 4.6 to get my original estimate, but looks like 4 was maybe above trend relative to ECI and 4.6 was below trend.