Vladimir_Nesov — AI Alignment Forum

My point is that the 10-30x AIs might be able to be more effective at coordination around AI risk than humans alone, in particular more effective than currently seems feasible in the relevant timeframe (when not taking into account the use of those 10-30x AIs). Saying "labs" doesn't make this distinction explicit.

Zach Stein-Perlman's Shortform

Vladimir_Nesov2d50

with 10-30x AIs, solving alignment takes like 1-3 years of work ... so a crucial factor is US government buy-in for nonproliferation

Those AIs might be able to lobby for nonproliferation or do things like write a better IABIED, making coordination interventions that oppose myopic racing. Directing AIs to pursue such projects could be a priority comparable to direct alignment work. Unclear how visibly asymmetric such interventions will prove to be, but then alignment vs. capabilities work might be in a similar situation.

If anyone builds it, everyone will plausibly be fine

Vladimir_Nesov9d32

I think such arguments buy us those 5% of no-takeover (conditional on superintelligence soon), and some of the moderate permanent disempowerment outcomes (maybe the future of humanity gets a whole galaxy out of 4 billion or so galaxies in the reachable universe), as distinct from almost total permanent disempowerment or extinction. Though I expect that it matters which specific projects we ask early AGIs to work on, more than how aligned these early AGIs are, basically for the reasons that companies and institutions employing humans are not centrally concerned with alignment of their employees in the ambitious sense, at the level of terminal values. More time to think of better projects for early AGIs, and time to reflect on pieces of feedback from such projects done by early AGIs, might significantly improve the chances for making ambitious alignment of superintelligence work eventually, on the first critical try, however long it takes to get ready to risk it.

If creation of superintelligence is happening on a schedule dictated by economics of technology adoption rather than by taking exactly the steps that we already know how to take correctly by the time we take them, affordances available to qualitatively smarter AIs will get out of control. And their misalignment (in the ambitious sense, at the level of terminal values) will lead them to taking over rather than complying with humanity's intentions and expectations, even if their own intentions and expectations don't involve humanity literally going extinct.

If anyone builds it, everyone will plausibly be fine

Vladimir_Nesov9d100

I think AI takeover is plausible. But Eliezer’s argument that it’s more than 98% likely to happen does not stand up to scrutiny

I think the part of the argument where an AI takeover is almost certain to happen if superintelligence^[1] is created soon is extremely convincing (I'd give this 95%), while the part where AI takeover almost certainly results in everyone dying is not. I'd only give 10-30% to everyone dying given an AI takeover (which is not really a decision relevant distinction, just a major difference in models).

But also the outcome of not dying from an AI takeover cashes out as permanent disempowerment, that is humanity not getting more than a trivial share in the reachable universe, with instead AIs taking almost everything. It's not centrally a good outcome that a sane civilization should be bringing about, even as it's also not centrally "doom". So the distinction between AI takeover and the book's titular everyone dying can be a crux, it's not interchangeable.

AIs that are collectively qualitatively better than the whole of humanity at stuff, beyond being merely faster and somewhat above the level of the best humans at everything at the same time. ↩︎

Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro

Vladimir_Nesov24d*65

Non-OpenAI pre-RLVR chatbots might serve as an anchor for how long it takes an AI company to turn an algorithmic idea into a frontier model, after it becomes a clearly worthwhile thing to do. Arguably only Anthropic managed to catch up to OpenAI, and it took them 1.5 years with Sonnet 3.5. Even Google never caught up after 2+ years, their first credibly frontier chatbot is Gemini 2.5 Pro, which is already well into RLVR (and similarly for Grok 4). So it seems reasonable to expect that it would take about 2 years for RLVR-based models to start being done well, somewhere in 2026-2027.

The IMO results probably indicate something about the current lower bound on capabilities in principle, for informally graded tasks such as natural language proofs. This is a lot higher than what finds practical use so far, and improvements in 2026-2027 might be able to capture this kind of thing (without needing the scale of 2026 compute).

Four ways learning Econ makes people dumber re: future AI

Vladimir_Nesov1mo43

When we’re talking about AGI, we’re talking about creating a new intelligent species on Earth, one which will eventually be faster, smarter, better-coordinated, and more numerous than humans.

Here too the labor/capital distinction seems like a distraction. Species or not, it's quickly going to become most of what's going on in the world, probably in a way that looks like "economic prosperity" to humanity (that essentially nobody is going to oppose), but at some point humanity becomes a tiny little ignorable thing in the corner, and then there is no reason for any "takeover" (which doesn't mean there will be survivors).

There is a question of how quickly that happens, but "takeover" or "another species" don't seem like cruxes to me. It's all about scale, and precursors to scale, the fact that catastrophe might be possible in more disputed ways even earlier than that doesn't affect what can be expected a bit later in any case, a few years or even decades down the line.

My AGI timeline updates from GPT-5 (and 2025 so far)

Vladimir_Nesov1mo70

GPT-5 probably isn't based on a substantially better pretrained model which is some evidence that OpenAI thinks the marginal returns from pretraining are pretty weak relative to the returns from RL

The model seems to be "small", but not necessarily with less pretraining in it (in the form of overtraining) than RLVR. There are still no papers I'm aware of on what the compute optimal (or GPU-time optimal) pretraining:RLVR ratio could be like. Matching GPU-time of pretraining and RLVR results in something like 4:1 (in terms of FLOPs), which would only be compute optimal (or GPU-time optimal) by unlikely coincidence.

If the optimal ratio of pretraining:RLVR is something like 1:10 (in FLOPs), then overtraining even smaller models is unimportant. But it could also be more like 40:1, in which case overtraining becomes a must (if inference cost/speed and HBM capacity of the legacy 8-chip servers force the param count to be smaller than compute optimal given the available training compute and the HBM capacity of GB200 NVL72).

Thane Ruthenis's Shortform

Vladimir_Nesov2mo40

The full text is on archive.today.

G Gordon Worley III's Shortform

Vladimir_Nesov3mo31

Peasants, when considered altogether, were crucial for the economy. So the intuitions about the idealized concept of a noble don't transfer given this disanalogy. And the actual historical nobles are not a robust prototype for the concept:

England at that time was conducting enclosures. Basically, rich people put up fences around common land to graze sheep on it. The poor were left with no land to grow food on, and had to go somewhere else. They ended up in cities, living in slums, trying to find scarce work and giving their last pennies to slumlords.

Towards a scale-free theory of intelligent agency

Vladimir_Nesov5mo*10

My point is more that you wouldn't want to define individuals as companies, or to say that only companies but not individuals can have agency. And that the things you are trying to get out of coalitional agency could already be there within individual agency.

A notion of choosing to listen to computations (or of analyzing things that are not necessarily agents in terms of which computations influence them) keeps coming up in my own investigations as a way of formulating coordination, decision making under logical uncertainty, and learning/induction. It incidentally seems useful for expressing coalitions, or as a way of putting market-like things inside agents. I dunno, it's not sufficiently fleshed out, so I don't really have a legible argument here, mostly an intuition that a notion of listening to computations would be more flexible in this context.

AI ALIGNMENT FORUM
Petrov Day
AF

AI ALIGNMENT FORUM
Petrov Day
AF

Posts

Wikitag Contributions

Comments