Crossposted from AI Impacts

[Epistemic status: I wrote this for Blog Post Day II. Sorry it's late.]

In this post, I distinguish between three different kinds of competitiveness -- Performance, Cost, and Date -- and explain why I think these distinctions are worth the brainspace they occupy. For example, they help me introduce and discuss a problem for AI safety proposals having to do with aligned AIs being outcompeted by unaligned AIs.

Distinguishing three kinds of competitiveness and competition

A system is performance-competitive insofar as its ability to perform relevant tasks compares with competing systems. If it is better than any competing system at the relevant tasks, it is very performance-competitive. If it is almost as good as the best competing system, it is less performance-competitive.

(For AI in particular, “speed” “quality” and “collective” intelligence as Bostrom defines them all contribute to performance-competitiveness.)

A system is cost-competitive to the extent that it costs less to build and/or operate than its competitors. If it is more expensive, it is less cost-competitive, and if it is much more expensive, it is not at all cost-competitive.

A system is date-competitive to the extent that it can be created sooner (or not much later than) its competitors. If it can only be created after a prohibitive delay, it is not at all date-competitive.

A performance competition is a competition that performance-competitiveness helps you win. The more important performance-competitiveness is to winning, the more intense the performance competition is.

Likewise for cost and date competitions. Most competitions are all three types, to varying degrees. Some competitions are none of the types; e.g. a “competition” where the winner is chosen randomly.

I briefly searched the AI alignment forum for uses of the word “competitive.” It seems that when people talk about competitiveness of AI systems, they usually mean performance-competitiveness, but sometimes mean cost-competitiveness, and sometimes both at once. Meanwhile, I suspect that this important post can be summarized as “We should do prosaic AI alignment in case only prosaic AI is date-competitive.”

Putting these distinctions to work

First, I’ll sketch some different future scenarios. Then I’ll sketch how different AI safety schemes might be more or less viable depending on which scenario occurs. For me at least, having these distinctions handy makes this stuff easier to think and talk about.

Disclaimer: The three scenarios I sketch aren’t supposed to represent the scenarios I think most likely; similarly, my comments on the three safety proposals are mere hot takes. I’m just trying to illustrate how these distinctions can be used.

Scenario: FOOM: There is a level of performance which leads to a localized FOOM, i.e. very rapid gains in performance combined with very rapid drops in cost, all within a single AI system (or family of systems in a single AI lab). Moreover, these gains & drops are enough to give decisive strategic advantage to the faction that benefits from them. Thus, in this scenario, control over the future is mostly a date competition. If there are two competing AI projects, and one project is building a system which is twice as capable and half the price but takes 100 days longer to build, that project will lose.

Scenario: Gradual Economic Takeover: The world economy gradually accelerates over several decades, and becomes increasingly dominated by billions of AGI agents. However, no one entity (AI or human, individual or group) has most of the power. In this scenario, control over the future is mostly a cost and performance competition. The values which shape the future will be the values of the bulk of the economy, and that in turn will be the values of the most popular and successful AGI designs, which in turn will be the designs that have the best combination of performance- and cost-competitiveness. Date-competitiveness is mostly irrelevant.

Scenario: Final Conflict: It’s just like the Gradual Economic Takeover scenario, except that several powerful factions are maneuvering and scheming against each other, in a Final Conflict to decide the fate of the world. This Final Conflict takes almost a decade, and mostly involves “cold” warfare, propaganda, coalition-building, alliance-breaking, and that sort of thing. Importantly, the victor in this conflict will be determined not so much by economic might as by clever strategy; a less well resourced faction that is nevertheless more far-sighted and strategic will gradually undermine and overtake a larger/richer but more dysfunctional faction. In this context, having the most capable AI advisors is of the utmost importance; having your AIs be cheap is much less important. In this scenario, control of the future is mostly a performance competition. (Meanwhile, in this same scenario, popularity in the wider economy is a moderately intense competition of all three kinds.)

Proposal: Value Learning: By this I mean schemes that take state-of-the-art AIs and train them to have human values. I currently think of these schemes as not very date-competitive, but pretty cost-competitive and very performance-competitive. I say value learning isn’t date-competitive because my impression is that it is probably harder to get right, and thus slower to get working, than other alignment proposals. Value learning would be better for the gradual economic takeover scenario because the world will change slowly, so we can afford to spend the time necessary to get it right, and once we do it’ll be a nice add-on to the existing state-of-the-art systems that won’t sacrifice much cost or performance.

Proposal: Iterated Distillation and Amplification: By this I mean… well, it’s hard to summarize. It involves training AIs to imitate humans, and then scaling them up until they are arbitrarily powerful while still human-aligned. I currently think of this scheme as decently date-competitive but not as cost-competitive or performance-competitive. But lack of performance-competitiveness isn’t a problem in the FOOM scenario because IDA is above the threshold needed to go FOOM; similarly, lack of cost-competitiveness is only a minor problem because if they don’t have enough money already, the first project to build FOOM-capable AI will probably be able to attract a ton of investment (e.g. via being nationalized) without even using their AI for anything, and then reinvest that investment into paying the extra cost of aligning it via IDA.

Proposal: Impact regularization: By this I mean attempts to modify state-of-the-art AI designs so that they deliberately avoid having a big impact on the world. I think of this scheme as being cost-competitive and fairly date-competitive. I think of it as being performance-uncompetitive in some competitions, but performance-competitive in others. In particular, I suspect it would be very performance-uncompetitive in the Final Conflict scenario (because AI advisors of world leaders need to be impactful to do anything), yet nevertheless performance-competitive in the Gradual Economic Takeover scenario.

Putting these distinctions to work again

I came up with these distinctions because they helped me puzzle through the following problem:

Lots of people worry that in a vastly multipolar, hypercompetitive AI economy (such as described in Hanson’s Age of Em or Bostrom’s “Disneyland without children” scenario) eventually pretty much everything of merely intrinsic value will be stripped away from the economy; the world will be dominated by hyper-efficient self-replicators various kinds, performing their roles in the economy very well and seeking out new roles to populate but not spending any time on art, philosophy, leisure, etc. Some value might remain, but the overall situation will be Malthusian.
Well, why not apply this reasoning more broadly? Shouldn’t we be pessimistic about any AI alignment proposal that involves using aligned AI to compete with unaligned AIs? After all, at least one of the unaligned AIs will be willing to cut various ethical corners that the aligned AIs won’t, and this will give it an advantage.

This problem is more serious the more the competition is cost-intensive and performance-intensive. Sacrificing things humans value is likely to lead to cost- and performance-competitiveness gains, so the more intense the competition is in those ways, the worse our outlook is.

However, it’s plausible that the gains from such sacrifices are small. If so, we need only worry in scenarios of extremely intense cost and performance competition.

Moreover, the extent to which the competition is date-intensive seems relevant. Optimizing away things humans value, and gradually outcompeting systems which didn’t do that, takes time. And plausibly, scenarios which are not at all date competitions are also very intense performance and cost competitions. (Given enough time, lots of different designs will appear, and minor differences in performance and cost will have time to overcome differences in luck.) On the other hand, aligning AI systems might take time too, so if the competition is too date-intensive things look grim also. Perhaps we should hope for a scenario in between, where control of the future is a moderate date competition.

Concluding thoughts

These distinctions seem to have been useful for me. However, I could be overestimating their usefulness. Time will tell; we shall see if others make use of them.

If you think they would be better if the definitions were rebranded or modified, now would be a good time to say so! I currently expect that a year from now my opinions on which phrasings and definitions are most useful will have evolved. If so, I'll come back and update this post.

Thanks to Katja Grace and Ben Pace for comments on a draft.



New Comment
14 comments, sorted by Click to highlight new comments since: Today at 7:20 PM

In the context of prosaic AI alignment, I've recently taken to splitting up competitiveness into “training competitiveness” and “objective competitiveness,”[1] where training competitiveness refers to the difficulty of training the system to succeed at its objective and objective competitiveness refers to the usefulness of a system that succeeds at that objective. I think my training competitiveness broadly maps onto a combination of your cost and date competitiveness and my objective competitiveness broadly maps onto your performance competitiveness. I think I mildly like my dichotomy better than your trichotomy in terms of thinking about prosaic AI alignment schemes, as I think it provides a better picture of the specific parts of a prosaic AI alignment proposal that are helping or hindering its overall competitiveness—e.g. if it's not very objective competitive, that tells you that you need a stronger objective, and if it's not very training competitive, that tells you that you need a better training process (it's also nice in terms of mirroring the inner/outer alignment distinction). That being said, your trichotomy is certainly more general in terms of applying to things that aren't just prosaic AI alignment.

  1. Objective competitiveness isn't a great term, though, since it can be misread as the opposite of subjective competitiveness—perhaps I'll switch now to using performance competitiveness instead. ↩︎

Mmm, nice. Thanks! I like your distinction also. I think yours is sufficiently different that we shouldn't see the two sets of distinctions as competing.* A system which has an objective which would be capable on paper but isn't capable in practice due to inner-alignment failures would be performance-uncompetitive but objective-competitive. For this reason I think we shouldn't equate objective and performance competitiveness.

If operating an AI system turns out to be an important part of the cost, then cost+date competitiveness would turn out to be different from training competitiveness, because cost competitiveness includes whatever the relevant costs are. However I expect operating costs will be much less relevant to controlling the future than costs incurred during the creation of the system (all that training, data-gathering, infrastructure building, etc.) so I think the mapping between cost+date competitiveness and training competitiveness basically works.

*Insofar as they are competing, I still prefer mine; as you say, it applies to more than just prosaic AI alignment proposals. Moreover, it makes it easier for us to talk about competitions as well, e.g. "In the FOOM scenario we need to win a date competition; cost-competitiveness still matters but not as much." Moreover cost, performance, and date are fairly self-explanatory terms, whereas as you point out "objective" is more opaque. Moreover I think it's worth distinguishing between cost and date competitiveness; in some scenarios one will be much more important than the other, and of course the two kinds of competitiveness vary independently in AI safety schemes (indeed maybe they are mildly anti-correlated? Some schemes are fairly well-defined and codified already, but would require tons of compute, whereas other schemes are more vague and thus would require tons of tweaking and cautious testing to get right, but don't take that much compute. I do like how your version maps more onto the inner vs. outer alignment distinction.

IDA is really aiming to be cost-competitive and performance-competitive, say to within overhead of 10%. That may or may not be possible, but it's the goal.

If the compute required to build and run your reward function is small relative to the compute required to train your model, then it seems like overhead is small. If you can do semi-supervised RL and only require a reward function evaluation on a minority of trajectories (e.g. because most of the work is learning about how to manipulate the environment), then you can be OK as long as the cost of running the reward function isn't too much higher.

Whether that's possible is a big open question. Whether it's date competitive depends on how fast you figure out how to do it.

I knew that the goal was to get IDA to be cost-competitive, but I thought current versions of it weren't. But that was just my rough impression; glad to be wrong, since it makes IDA seem even more promising. :) Of all the proposals I've heard of, IDA seems to have the best combination of cost, date, and performance-competitiveness.

I think our current best implementation of IDA would neither be competitive nor scalably aligned :)

Update: Jan Leike independently thought of a very similar taxonomy! I actually like it somewhat more; see comments for why.

In most cases you can continuously trade off performance and cost; for that reason I usually think of them as a single metric of "competitive with X% overhead." I agree there are cases where they come apart, but I think there are pretty few examples. (Even for nuclear weapons you could ask "how much more expensive is it to run a similarly-destructive bombing campaign with conventional explosives.")

I think this works best if you consider a sequence of increments each worth +10%, rather than say accumulating 70 of those increments, because "spend 1000x more" is normally not available and so we don't have a useful handle on what a technology looks like when scaled up 1000x (and that scaleup would usually involve a bunch of changes that are hard to anticipate).

That is, if we have a sequence of technologies A0, A1, A2, ..., AN, each of which is 10% cheaper than the one before, then we may say that AN is better than A0 by N 10% steps (rather than trying to directly evaluate how many orders of magnitude you'd have to spend on A0 to compete with AN, because the process "spend a thousand times more on A0 in a not-stupid way" is actually kind of hard to imagine).

I agree this may be true in most cases, but the chance of it not being true for AI is large enough to motivate the distinction. Besides, not all cases in which performance and cost can be traded off are the same; in some scenarios the "price" of performance is very high whereas in other scenarios it is low. (e.g. in Gradual Economic Takeover, let's say, a system being twice as qualitatively intelligent could be equivalent to being a quarter the price. Whereas in Final Conflict, a system twice as qualitatively intelligent would be equivalent to being one percent the price.) So if we are thinking of a system as "competitive with X% overhead," well, X% is going to vary tremendously depending on which scenario is realized. Seems worth saying e.g. "costs Y% more compute, but is Z% more capable."

Very interesting definitions! I like the way they're used here to compare different scenarios.

Proposal: Iterated Distillation and Amplification: [...] I currently think of this scheme as decently date-competitive but not as cost-competitive or performance-competitive.

I think IDA's date-competitiveness will depend on the progress we'll have in inner alignment (or our willingness to bet against inner alignment problems occurring, and whether we'll be correct about it). Also, I don't see why we should expect IDA to not be very performance-competitive (if I understand correctly the hope is to get a system that can do anything useful that a human with a lot of time can do).

Generally, when using these definitions for comparing alignment approaches (rather than scenarios) I suspect we'll end up talking a lot about "the combination of date- and performance-competitiveness", because I expect the performance-competitiveness of most approaches will depend on how much research effort is invested in them.

Good point about inner alignment problems being a blocker to date-competitiveness for IDA... but aren't they also a blocker to date-competitiveness for every other alignment scheme too pretty much? What alignment schemes don't suffer from this problem?

I'm thinking "Do anything useful that a human with a lot of time can do" is going to be substantially less capable than full-blown superintelligent AGI. However, that's OK, because we can use IDA as a stepping-stone to that. IDA gets us an aligned system substantially more capable than a human, and we use that system to solve the alignment problem and build something even better.

It's interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness. I think it's fine to just talk about "competitiveness" full stop, and only bother to specify what we mean more precisely when needed. Sometimes we'll mean one of the three, sometimes two of the three, sometimes all three.

Good point about inner alignment problems being a blocker to date-competitiveness for IDA... but aren't they also a blocker to date-competitiveness for every other alignment scheme too pretty much?

I think every alignment approach (other than interpretability-as-a-standalone-approach) that involves contemporary ML (i.e. training large neural networks) may have its date-competitiveness affected by inner alignment.

What alignment schemes don't suffer from this problem?

Most alignment approaches may have their date-competitiveness affected by inner alignment. (It seems theoretically possible to use whole brain emulation without inner alignment related risks, but as you mentioned elsewhere someone may build a neuromorphic AGI before we get there.)

I'm thinking "Do anything useful that a human with a lot of time can do" is going to be substantially less capable than full-blown superintelligent AGI.

I agree. Even a "narrow AI" system that is just very good at predicting stock prices may outperform "a human with a lot of time" (by leveraging very-hard-to-find causal relations).

Instead of saying we should expect IDA to be performance-competitive, I should have said something like the following: If at some point in the future we get to a situation where trillions of safe AGI systems are deployed—and each system can "only" do anything that a human-with-a-lot-of-time can do—and we manage to not catastrophically screw up until that point, I think humanity will probably be out of the woods. (All of humanity's regular problems will probably get resolved very quickly, including the lack of coordination.)

It's interesting how Paul advocates merging cost and performance-competitiveness, and you advocate merging performance and date-competitiveness.

Also I advocated merging cost and date competitiveness (into training competitiveness), so we have every combination covered.

Oh right, how could I forget! This makes me very happy. :D

Some thoughts that came to me after I wrote this post:

--I'm not sure I should define date-competitive the way I do. Maybe instead of "can be built" it should be "is built." If we go the latter route, the FOOM scenario is an extremely intense date competition. If we go the former route, the FOOM scenario is not necessarily an intense date competition; it depends on what other factors are at play. For example, maybe there are only a few major AI projects and all of them are pretty socially responsible, so a design is more likely to win if it can be built sooner, but it won't necessarily win; maybe cooler heads will prevail and build a safer design instead.

--Why is date-competitiveness worth calling a kind of competitiveness at all? Why not just say: "We want our AI safety scheme/design to be cost- and performance-competitive, and also we need to be able to build it fairly quickly compared to the other stuff that gets built." Well, 1. Even that is clunky and awkward compared to the elegant "...and also date-competitive." 2. It really does have the comparative flavor of competition to it; what matters is not how long it takes us to complete our safety scheme, but how long it takes relative to unaligned schemes, and it's not as simple as just "we need to be first," rather it's that sooner is better but doing it later isn't necessarily game over... 3. It seems to be useful for describing date competitions, which are important to distinguish from situations which are not date competitions or less so. (Aside: A classic criticism of the "Let's build uploads first, and upload people we trust" strategy is that neuromorphic AI will probably come before uploads. In other words, this strategy is not date-competitive.)

--I'm toying with the idea of adding "alignment-competitiveness" (meaning, as aligned or more aligned than competing systems) and "alignment competition" to the set of definitions. This sounds silly, but it would be conceptually neat, because then we can say: We hope for scenarios in which control of the future is a very intense alignment competition, and we are working hard to make it that way. "