I find myself somewhat confused as to why I should find Part I of “What failure looks like” (hereafter "WFLL1", like the pastry) likely enough to be worth worrying about. I have 3 basic objections, although I don't claim that any are decisive. First, let me summarize WFLL1 as I understand it:
In general, it's easier to optimize easy-to-measure goals than hard-to-measure ones, but this disparity is much larger with ML models than with humans and human-made institutions. As special-purpose AI becomes more powerful, this will lead to a form of differential progress where easy-to-measure goals become optimized well past the point when they correlated with what we actually want.
(See also: this critique, although I agree with the existing rebuttals to it).
Objection 1: Historical precedent
In the late 1940s, George Dantzig invented the simplex algorithm, a practically efficient method for solving linear optimization problems. At the same time, the first modern computers were coming around, which he had access to as a mathematician in the US military. For Dantzig and his contemporaries, a wide class of previously intractable problems suddenly became solvable, and they did use the new methods to great effect, playing a major part in developing the field of operations research.
With the new tools in hand, Dantzig also decided to use simplex to optimize his diet. After carefully poring over prior work, and putting in considerable effort to obtain accurate data and correctly specify the coefficients, Dantzig was now ready, telling his wife:
whatever the [IBM] 701 says that's what I want you to feed me each day starting with supper tonight.
The result included 500 gallons of vinegar.
After delisting vinegar as a food, the next round came back with 200 boullion cubes/day. There were several more iterations, none of which worked, and after everything Dantzig simply went with a "common-sense" diet.
The point I am making is, whenever we create new methods for solving problems, we end up with a bunch of solutions looking for problems. Typically, we try to apply those solutions as widely as possible, and then quickly notice when some of those solutions don't solve the problems we actually want to solve.
Suppose that around 1950, we were musing about the potential consequences of the coming IT revolution. We might've noticed that we were entering the era of the algorithm, where a potentially very wide class of problems could be solved--if they could be reduced to arithmetic and run on the new machines, with their scarcely fathomable ability to memorize a lot and calculate in mere moments. And we could ask "But what about love, honor or justice? Will we forget about those unquantifiable things in the era of the algorithm?" [excuse me if this phrasing sounds snarky] And yet, in the decades since, we seem to have basically just used computers to solve the problems we actually want to solve, and we don't seem to have stopped valuing the things that aren't under their scope.
If we round off WFLL1 to "when you have a hammer, everything looks like a nail", then this only seems mildly and benignly true in the case of most technologies, i.e. the trend seems to be that if technology A makes us better at doing some class of tasks X, we poke around to see just how big X is, until we've delineated the border well and stop, with the exploratory phase rarely causing large-scale harm.
I don't think the OP is intending WFLL1 to say something this broad, but then I feel it should be clarified why "this time is different", such as why modern D(R)L should be fundamentally different from linear optimization, the IT revolution, or even non-deep ML.
(I think the discontinuity-based arguments largely do make the "this time is different" case, roughly because general intelligence seems clearly game-changing. WFLL2 seems somewhere in between these, and I'm unsure where my beliefs fall on that.)
Objection 2: Absence of evidence
I don't see any particular evidence of this as of WFLL1 unfolding as we conclude the 2010s. As I understand, it should gradually "show up" well before AGI, but given that we already have a lot of ML already deployed, this at least causes one to ask when this should be expected to be noticeable, in terms of the necessary capabilities of the AI systems.
Objection 3: Why privilege this axis (of differential progress)
It seems likely that if ML continues to advance substantially over the coming decades (as much as the rate 2012-2019), then it will cause substantial differential progress. But along what axes? WFLL1 singles out the axis "easy-to-measure vs. hard-to-measure", and it's not clear to me why we should worry about this in particular.
For instance, there's also the axis "have massive datasets vs. don't have massive datasets". And we could point to various examples of this form, e.g. it's easy to measure a country's GDP year over year, but we can get at most a few hundred data points on this, hence it's completely unsuitable for DL. So, for instance, we could see differential progress on microeconomics vs. macroeconomics.
More generally, we could ask what things DL seems weak at:
- Performance at the task must be easy to measure
- A massive, labelled, digitized training set must exist (or can be easily made with e.g. self-play)
- DL seems relatively weak at learning causality
- (Other things listed by e.g. Gary Marcus)
And from there, we could reasonably extrapolate to what DL will be good/bad at, relative to the baseline of human thinking/heuristics.
WFLL1 seems to basically say: "here's this axis of differential progress (arising from a limitation of DL), and here are some examples of ways things can go wrong as a result". But for any other limitation we list, I'd suspect we can also list examples such as "if DL is really capable in general but really bad at causal modeling, here's a thing that can go wrong."
At least to me, the ease-of-measurement bullet point does not seem to pop out as a very natural category: if interpreted broadly, it does not capture everything that seems plausibly important, and if interpreted narrowly, it does not seem narrow enough to focus our attention on any one interesting failure mode.
I'm pretty sure WFLL1 only applies in the case where AI is "responsible for" some very large fraction of the economy (I imagine >90%), for which we don't really have much of a historical precedent.
When I imagine WFLL1 that doesn't turn into WFLL2, I usually imagine a world in which all existing humans lead great lives, but don't have much control over the future. On a moment-to-moment basis, that world is better than the current world, but we don't get to influence the future and make use of the cosmic endowment, and so from a total view we have lost >99% of the potential value of the future. Such a world can still include love, honor and justice among the humans who are still around.
On the other hand, the last time I mentioned this among ~6 people, all at least interested in AI safety, not a single other person shared this impression, but still found WFLL1 convincing as an example of a world that was moment-to-moment worse than the current world, but still not WFLL2.
AI has a very minor economic impact right now, but even so, I'd argue that the concerns over fairness and bias in AI are evidence of WFLL1, since we can't measure the "fairness" of a classifier.
Mostly that for all the other axes you name, I expect deep learning to eventually become capable of doing those axes. To be fair, I also think that deep learning models will be able to do what we mean rather than what we measure, but that seems like the one most likely to fail. (I do find the dataset axis somewhat convincing, but even there I expect self-supervised learning to make that axis less important.)
I was uncertain about this, but it seems this is at least what Paul intended. From here, about WFLL1:
Suppose it was easy to create automated companies, and skim a bit off the top. AI algorithms are just better at buisness than any startup founder. Soon some people create these algorithms, give them a few quid in seed capitat and leave them to trade and accumulate money. The algorithms rapidly increase their wealth, and soon own much of the world economy. Humans are removed when the AIs have the power to do so at a profit. This ends in several superintelligences tiling the universe with economium together.
For this to happen, we need
1) Doubling time of fooming AI months to years, to allow many AI's to be in the running.
2) Its fairly easy to set an AI to maximize money.
3) The people that care about complex human values can't effectively make an AI to do that.
4) Any attempts to stamp out all fledgling AIs before they get powerful fails. Helped by anonymous cloud computing.
I don't really buy 1) , but it is fairly plausible, I'm not convinced of 2) either, although it might not be hard to build a mesa optimiser that cares about something sufficiently correlated with money, that humans are beyond caring before any serious deviation from money optimization happens.
If 2) were false, and people who tried to make AI's all got paperclip maximisers, the long run result is just a world filled with paperclips not banknotes. (Although this would make coordinating to destroy the AI's a little easier?) The paperclip maximisers would still try to gain economic influence until they could snap nanotech fingers.