Vladimir Nesov

Wikitag Contributions

Comments

Sorted by

With AI assistance, the degree to which an alternative is ready-to-go can differ a lot compared to its prior human-developed state. Also, an idea that's ready-to-go is not yet an edifice of theory and software that's ready-to-go in replacing 5e28 FLOPs transformer models, so some level of AI assistance is still necessary with 2 year timelines. (I'm not necessarily arguing that 2 year timelines are correct, but it's the kind of assumption that my argument should survive.)

The critical period includes the time when humans are still in effective control of the AIs, or when vaguely aligned and properly incentivised AIs are in control and are actually trying to help with alignment, even if their natural development and increasing power would end up pushing them out of that state soon thereafter. During this time, the state of current research culture shapes the path-dependent outcomes. Superintelligent AIs that are reflectively stable will no longer allow path dependence in their further development, but before that happens the dynamics can be changed to an arbitrary extent, especially with AI efforts as leverage in implementing the changes in practice.

prioritization depends in part on timelines

Any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. Even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. So it makes sense to still prioritize things that have no hope at all of becoming practical for decades (with human effort), to make as much partial progress as possible in developing (and deconfusing) them in the next few years.

In this sense current human research, however far from practical usefulness, forms the data for alignment of the early AI-assisted or AI-driven alignment research efforts. The judgment of human alignment researchers who are currently working makes it possible to formulate more knowably useful prompts for future AIs that nudge them in the direction of actually developing practical alignment techniques.

It's an essential aspect of decision making for an agent to figure out where it might be. Thought experiments try to declare the current situation, but they don't necessarily need to be able to convincingly succeed. Algorithmic induction, such as updating from Solomonoff prior, is the basic way an agent figures out which situations it should care about, and declaring that we are working with a particular thought experiment doesn't affect the prior. In line with updatelessness, an agent should be ready for observations in general (according to which of them it cares about more), rather than particular "fair" observations, so distinguishing observations that describe "fair" thought experiments doesn't seem right either.

Coalitional agency seems like an unnecessary constraint on design of a composite agent, since an individual agent could just (choose to) listen to other agents and behave the way their coalition would endorse, thereby effectively becoming a composite agent, without being composite "by construction". The step where an agent chooses which other (hypothetical) agents to listen to makes constraints on the nature of agents unnecessary, because the choice to listen to some agents and not others can impose any constraints that particular agent cares about, and so an "agent" could be as vague as a "computation" or a program.

(Choosing to listen to a computation means choosing a computation based on considerations other than its output, committing to use its output in a particular way without yet knowing what it's going to be, and carrying out that commitment once the output becomes available, regardless of what it turns out to be.)

This way we can get back to individual rationality, figuring out how an agent should choose to listen to which other agents/computations when coming up with its own beliefs and decisions. But actually occasionally listening to those other computations is the missing step in most decision theories, which would take care of interaction with other agents (both actual and hypothetical).

Discussions of how to aggregate values and probabilities feel disjoint. Jeffrey-Bolker formulation of expected utility presents the preference data as two probability distributions over the same sample space, so that expected utility of an event is reconstructed as the ratio of the event's measures given by the two priors. (The measure that goes into the numerator is "shouldness", and the other one remains "probability".)

This gestures at a way of reducing the problem of aggregating values to the problem of aggregating probabilities. In particular, markets seem to be easier to set up for probabilities than for expected utilities, so it might be better to set up two markets that are technically the same type of thing, one for probability and one for shouldness, than to target expected utility directly. Values of different agents are incomparable, but so are priors, any fundamental issues with aggregation seem to remain unchanged by this reformulation. These can't be "prediction" markets since resolution is not straightforward and somewhat circular, grounded in what the coalition will settle on eventually, but logical induction has to deal with similar issues already.

Cyberattacks can't disable anything with any reliability or for more than days to weeks though, and there are dozens of major datacenter campuses from multiple somewhat independent vendors. Hypothetical AI-developed attacks might change that, but then there will also be AI-developed information security, adapting to any known kinds of attacks and stopping them from being effective shortly after. So the MAD analogy seems tenuous, the effect size (of this particular kind of intervention) is much smaller, to the extent that it seems misleading to even mention cyberattacks in this role/context.

Oversight, auditing, and accountability are jobs. Agriculture shows that 95% of jobs going away is not the problem. But AI might be better at the new jobs as well, without any window of opportunity where humans are initially doing them and AI needs to catch up. Instead it's AI that starts doing all the new things well first and humans get no opportunity to become competitive at anything, old or new, ever again.

Even formulation of aligned high-level tasks and intent alignment of AIs make sense as jobs that could be done well by misaligned AIs for instrumental reasons. Which is not even deceptive alignment, but still plausibly segues into gradual disempowerment or sharp left turn.

My point is that a bit of scaling (like 3x) doesn't matter, even though at the scale of GPT-4.5 or Grok 3 it requires building a $5bn training system, but a lot of scaling (like 2000x up from the original GPT-4) is still the most important thing impacting capabilities that will predictably happen soon. And it's going to arrive a little bit at a time, so won't be obviously impactful at any particular step, not doing anything to disrupt the rumors of no longer being important. It's a rising sea kind of thing (if you have the compute).

Long reasoning traces were always necessary to start working at some point, and s1 paper illustrates that we don't really have evidence yet that R1-like training creates rather than elicits nontrivial capabilities (things that wouldn't be possible to transfer in mere 1000 traces). Amodei is suggesting that RL training can be scaled to billions of dollars, but unclear if this assumes that AIs will automate creation of verifiable tasks. If constructing such tasks (or very good reward models) is the bottleneck, this direction of scaling can't quickly get very far outside specialized domains like chess where a single verifiable task (winning a game) generates endless data.

The quality data wall and flatlining benchmarks (with base model scaling) are about compute multipliers that depend on good data but don't scale very far. As opposed to scalable multipliers like high sparsity MoE. So I think these recent 4x a year compute multipliers mostly won't work above 1e27-1e28 FLOPs, which superficially looks bad for scaling of pretraining, but won't impact the less legible aspects of scaling token prediction (measured in perplexity on non-benchmark data) that are more important for general intelligence. There's also the hard data wall of literally running out of text data, but being less stringent on data quality and training for multiple epochs (giving up the ephemeral compute multipliers from data quality) should keep it at bay for now.

my intuitions have been shaped by events like the pretraining slowdown

I don't see it. GPT-4.5 is much better than the original GPT-4, probably at 15x more compute. But it's not 100x more compute. And GPT-4o is an intermediate point, so the change from GPT-4o to GPT-4.5 is even smaller, maybe 4x.

I think 3x change in compute has an effect at the level of noise from different reasonable choices in constructing a model, and 100K H100s is only 5x more than 20K H100s of 2023. It's not a slowdown relative to what it should've been. And there are models with 200x more raw compute than went into GPT-4.5 that are probably coming in 2027-2029, much more than the 4x-15x observed since 2022-2023.

LLMs compute probability of a sequence, but truth/good distinction is captured by two-dimensional Jeffrey-Bolker measure (I'm calling its components "probability" and "shouldness", their ratio is the expected utility of an event). Shouldness is reconstructed from probability and expected utility as their product, so plausibly it behaves on long sequences similarly to probability, it generally gets lower for longer sequences, but tends to be higher for simpler sequences.

The analogy between probability and shouldness suggests that some form of pretraining might be able to create models for either of them (as opposed to a base model that learns something inbetween from raw data with no supervision from preference data). Then expected utility is the ratio, that is instead of looking at logits of one LLM, we look at differences of logits for two LLMs, a shouldness-LLM and a probability-LLM (with some regularization that anchors to a base model instead of goodharting towards high approximate expected utility low probability sequences). Possibly this needs interspersing preference training with pretraining, rather than only applying preference training during post-training, so that there are two different pretrained models that nurture different collections of circuits (for probability and for shouldness).

(Some kind of Solomonoff induction analogy for probability/shouldness should be a clearer thing to express, might be more relevant in decision theory context, where you start with description lengths of programs in two different languages, a language of probability-programs and another language of shouldness-programs, and then convert these into probability and shouldness distributions over sequences, enabling both probability induction and shouldness induction for the next element of a sequence. Solomonoff induction ignores distinctions between languages in the limit, but this kind of probability/shouldness induction works with pairs of languages and the distinction between two languages in a given pair is the most important thing, as it defines expected utility.)

Load More