AI ALIGNMENT FORUM
AF

10
Vladimir Nesov
Ω51922101508
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
If anyone builds it, everyone will plausibly be fine
Vladimir_Nesov5d32

I think such arguments buy us those 5% of no-takeover (conditional on superintelligence soon), and some of the moderate permanent disempowerment outcomes (maybe the future of humanity gets a whole galaxy out of 4 billion or so galaxies in the reachable universe), as distinct from almost total permanent disempowerment or extinction. Though I expect that it matters which specific projects we ask early AGIs to work on, more than how aligned these early AGIs are, basically for the reasons that companies and institutions employing humans are not centrally concerned with alignment of their employees in the ambitious sense, at the level of terminal values. More time to think of better projects for early AGIs, and time to reflect on pieces of feedback from such projects done by early AGIs, might significantly improve the chances for making ambitious alignment of superintelligence work eventually, on the first critical try, however long it takes to get ready to risk it.

If creation of superintelligence is happening on a schedule dictated by economics of technology adoption rather than by taking exactly the steps that we already know how to take correctly by the time we take them, affordances available to qualitatively smarter AIs will get out of control. And their misalignment (in the ambitious sense, at the level of terminal values) will lead them to taking over rather than complying with humanity's intentions and expectations, even if their own intentions and expectations don't involve humanity literally going extinct.

Reply
If anyone builds it, everyone will plausibly be fine
Vladimir_Nesov5d100

I think AI takeover is plausible. But Eliezer’s argument that it’s more than 98% likely to happen does not stand up to scrutiny

I think the part of the argument where an AI takeover is almost certain to happen if superintelligence[1] is created soon is extremely convincing (I'd give this 95%), while the part where AI takeover almost certainly results in everyone dying is not. I'd only give 10-30% to everyone dying given an AI takeover (which is not really a decision relevant distinction, just a major difference in models).

But also the outcome of not dying from an AI takeover cashes out as permanent disempowerment, that is humanity not getting more than a trivial share in the reachable universe, with instead AIs taking almost everything. It's not centrally a good outcome that a sane civilization should be bringing about, even as it's also not centrally "doom". So the distinction between AI takeover and the book's titular everyone dying can be a crux, it's not interchangeable.


  1. AIs that are collectively qualitatively better than the whole of humanity at stuff, beyond being merely faster and somewhat above the level of the best humans at everything at the same time. ↩︎

Reply
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
Vladimir_Nesov20d*65

Non-OpenAI pre-RLVR chatbots might serve as an anchor for how long it takes an AI company to turn an algorithmic idea into a frontier model, after it becomes a clearly worthwhile thing to do. Arguably only Anthropic managed to catch up to OpenAI, and it took them 1.5 years with Sonnet 3.5. Even Google never caught up after 2+ years, their first credibly frontier chatbot is Gemini 2.5 Pro, which is already well into RLVR (and similarly for Grok 4). So it seems reasonable to expect that it would take about 2 years for RLVR-based models to start being done well, somewhere in 2026-2027.

The IMO results probably indicate something about the current lower bound on capabilities in principle, for informally graded tasks such as natural language proofs. This is a lot higher than what finds practical use so far, and improvements in 2026-2027 might be able to capture this kind of thing (without needing the scale of 2026 compute).

Reply11
Four ways learning Econ makes people dumber re: future AI
Vladimir_Nesov1mo43

When we’re talking about AGI, we’re talking about creating a new intelligent species on Earth, one which will eventually be faster, smarter, better-coordinated, and more numerous than humans.

Here too the labor/capital distinction seems like a distraction. Species or not, it's quickly going to become most of what's going on in the world, probably in a way that looks like "economic prosperity" to humanity (that essentially nobody is going to oppose), but at some point humanity becomes a tiny little ignorable thing in the corner, and then there is no reason for any "takeover" (which doesn't mean there will be survivors).

There is a question of how quickly that happens, but "takeover" or "another species" don't seem like cruxes to me. It's all about scale, and precursors to scale, the fact that catastrophe might be possible in more disputed ways even earlier than that doesn't affect what can be expected a bit later in any case, a few years or even decades down the line.

Reply
My AGI timeline updates from GPT-5 (and 2025 so far)
Vladimir_Nesov1mo70

GPT-5 probably isn't based on a substantially better pretrained model which is some evidence that OpenAI thinks the marginal returns from pretraining are pretty weak relative to the returns from RL

The model seems to be "small", but not necessarily with less pretraining in it (in the form of overtraining) than RLVR. There are still no papers I'm aware of on what the compute optimal (or GPU-time optimal) pretraining:RLVR ratio could be like. Matching GPU-time of pretraining and RLVR results in something like 4:1 (in terms of FLOPs), which would only be compute optimal (or GPU-time optimal) by unlikely coincidence.

If the optimal ratio of pretraining:RLVR is something like 1:10 (in FLOPs), then overtraining even smaller models is unimportant. But it could also be more like 40:1, in which case overtraining becomes a must (if inference cost/speed and HBM capacity of the legacy 8-chip servers force the param count to be smaller than compute optimal given the available training compute and the HBM capacity of GB200 NVL72).

Reply
Thane Ruthenis's Shortform
Vladimir_Nesov1mo40

The full text is on archive.today.

Reply1
G Gordon Worley III's Shortform
Vladimir_Nesov3mo31

Peasants, when considered altogether, were crucial for the economy. So the intuitions about the idealized concept of a noble don't transfer given this disanalogy. And the actual historical nobles are not a robust prototype for the concept:

England at that time was conducting enclosures. Basically, rich people put up fences around common land to graze sheep on it. The poor were left with no land to grow food on, and had to go somewhere else. They ended up in cities, living in slums, trying to find scarce work and giving their last pennies to slumlords.

Reply
Towards a scale-free theory of intelligent agency
Vladimir_Nesov5mo*10

My point is more that you wouldn't want to define individuals as companies, or to say that only companies but not individuals can have agency. And that the things you are trying to get out of coalitional agency could already be there within individual agency.

A notion of choosing to listen to computations (or of analyzing things that are not necessarily agents in terms of which computations influence them) keeps coming up in my own investigations as a way of formulating coordination, decision making under logical uncertainty, and learning/induction. It incidentally seems useful for expressing coalitions, or as a way of putting market-like things inside agents. I dunno, it's not sufficiently fleshed out, so I don't really have a legible argument here, mostly an intuition that a notion of listening to computations would be more flexible in this context.

Reply
abramdemski's Shortform
Vladimir_Nesov5mo20

With AI assistance, the degree to which an alternative is ready-to-go can differ a lot compared to its prior human-developed state. Also, an idea that's ready-to-go is not yet an edifice of theory and software that's ready-to-go in replacing 5e28 FLOPs transformer models, so some level of AI assistance is still necessary with 2 year timelines. (I'm not necessarily arguing that 2 year timelines are correct, but it's the kind of assumption that my argument should survive.)

The critical period includes the time when humans are still in effective control of the AIs, or when vaguely aligned and properly incentivised AIs are in control and are actually trying to help with alignment, even if their natural development and increasing power would end up pushing them out of that state soon thereafter. During this time, the state of current research culture shapes the path-dependent outcomes. Superintelligent AIs that are reflectively stable will no longer allow path dependence in their further development, but before that happens the dynamics can be changed to an arbitrary extent, especially with AI efforts as leverage in implementing the changes in practice.

Reply
abramdemski's Shortform
Vladimir_Nesov6mo43

prioritization depends in part on timelines

Any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. Even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. So it makes sense to still prioritize things that have no hope at all of becoming practical for decades (with human effort), to make as much partial progress as possible in developing (and deconfusing) them in the next few years.

In this sense current human research, however far from practical usefulness, forms the data for alignment of the early AI-assisted or AI-driven alignment research efforts. The judgment of human alignment researchers who are currently working makes it possible to formulate more knowably useful prompts for future AIs that nudge them in the direction of actually developing practical alignment techniques.

Reply
Load More
4Vladimir_Nesov's Shortform
1y
0
68Short Timelines Don't Devalue Long Horizon Research
6mo
2
4Vladimir_Nesov's Shortform
1y
0
10Bayesian Utility: Representing Preference by Probability Measures
16y
0
Well-being
14 days ago
(+58/-116)
Sycophancy
14 days ago
(-231)
Quantilization
2 years ago
(+13/-12)
Bayesianism
3 years ago
(+1/-2)
Bayesianism
3 years ago
(+7/-9)
Embedded Agency
3 years ago
(-630)
Conservation of Expected Evidence
4 years ago
(+21/-31)
Conservation of Expected Evidence
4 years ago
(+47/-47)
Ivermectin (drug)
4 years ago
(+5/-4)
Correspondence Bias
4 years ago
(+35/-36)
Load More