Unsure, partially noticing my own confusion. Hoping Cunningham's Law can help resolve it.
Some MIRI people (e.g. Rob Bensinger) still highlight EU maximisers as the paradigm case for existentially dangerous AI systems. I'm confused by this for a few reasons:
I don't expect the systems that matter (in the par human or strongly superhuman regime) to be expected utility maximisers. I think arguments for AI x-risk that rest on expected utility maximisers are mostly disconnected from reality. I suspect that discussing the perils of expected utility maximisation in particular — as opposed to e.g. dangers from powerful (consequentialist?) optimisation processes — is somewhere between being a distraction and being actively harmful.
I do not think expected utility maximisation is the limit of what generally capable optimisers look like.
I don't think the case for existential risks from AI safety rest on expected utility maximisation. I kind of stopped alieving expected utility maximisers a while back (only recently have I synthesised explicit beliefs that reject it), but I still plan on working on AI existential safety, because I don't see the core threat as resulting from expected utility maximisation.
The reasons I consider AI an existential threat mostly rely on:
I do not actually expect extinction near term, but it's not the only "existential catastrophe":
I optimised for writing this quickly. So my language may be stronger/more confident that I actually feel. I may not have spent as much time accurately communicating my uncertainty as may have been warranted.
Correct me if I'm mistaken, but I'm under the impression that RL is the main training paradigm we have that selects for agents.
I don't necessarily expect that our most capable systems would be trained via reinforcement learning, but I think our most agentic systems would be.
There may be significant opportunity cost via diverting attention from other more plausible pathways to doom.
In general, I think exposing people to bad arguments for a position is a poor persuasive strategy as people who dismiss said bad arguments may (rationally) update downwards on the credibility of the position.
I don't necessarily think agents are that limit either. But as "Why Subagents?" shows, expected utility maximisers aren't the limit of idealised agency.