One major question that heavily influences the choice of alignment research directions is the degree to which incremental improvements are necessary for major paradigm shifts. Because alignment is largely preparadigmatic, we may require a paradigm shift before we can make substantial progress towards aligning superhuman AI systems, rather than merely incremental improvements. The answer to this question determines whether the best approach to alignment is to choose metrics and try to make incremental progress on alignment research questions, or to mostly fund things that are long shots, or something else entirely. It also informs what policy we should take with respect to capabilities acceleration as an externality to alignment work: the degree to which incremental improvements in capabilities lead to paradigm shifts in capabilities informs how much we should worry about incremental capabilities improvements as a byproduct.   

Some possible ways-the-world-could-be include:

  • Incremental improvements have negligible impact on when paradigm shifts happen and could be eliminated entirely without any negative impact on when paradigm shifts occur. All or the vast majority of incremental work is visible from the start as low risk low reward, and potentially paradigm shift causing work is visible from the start as high risk high reward.
  • Incremental improvements serve to increase attention in the field and thus increase the amount of funding for the field as a whole, thereby proportionally increasing the absolute number of people working on paradigmatic directions, but funding those working on potential paradigm shifts directly would yield the same paradigm shifts at the same time
  • Incremental improvements are necessary to convince risk averse funding sources to continue funding something, since putting money into something for years with no visible output is not popular with many funders, and thus forces researchers to divert a certain % of their time to working on funder-legible incremental improvements.
  • Most paradigm shifts arise from attempts to make incremental improvements that accidentally uncover something deeper in the process. It is difficult to tell before embarking on a project whether it will only yield an incremental improvement, no improvement at all, or a paradigm shift.
  • Most paradigm shifts cannot occur until incremental improvements lay the foundation for the paradigm shift to happen, no matter how much effort is put into trying to recognize paradigm shifts.
  • Something else?

I think figuring out which of these universes we're in would be enormously valuable and seems especially high leverage. I think in particular there are likely to be many different perspectives on this and a lot of productive disagreement between those perspectives, so I don't think multiple people working on this has a stepping-on-toes effect. 

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 2:27 AM

It could be useful to look at historical paradigm shifts to get an understanding of how this might take place.

Epistemic status: I’m a ML PhD student, just an amateur in economic history, but I have read a lot.  So a bit under medium confident maybe?

Much of what I’ve read from the history of science, technology, or economic history doesn’t seem directly relevant [1].

The closest to what we are looking for is Mokyr’s distinction between macro and micro inventions.  Initially, macroinventions were radical, novel breakthroughs, while microinventions were from incremental progress, though over time the meanings shifted towards economic impact.  As I think that is more confusing to have both, I’ll stick with the initial meaning, and call events with large impact revolutionary.  Making this even trickier is that most macroinventions are fairly inefficient, but are vastly improved through a series of iterative microinventions.  If we assume fast take-off, we don’t get the follow-up series of iterative improvements to fix things.

As not a lot is known about where macroinventions come from, we’ll have to figure this out ourselves from what data exists.  I think there are two ways of looking at it which is valuable.  First, do revolutionary discoveries come out of iterative improvements, or more radical macro-inventions.  Secondly, given that we are pre-paradigmatic, and don’t have any great insight as to where to go, more understanding of this would be quite useful.  So I will look at to what extent does understanding comes from interactive improvements and examining existing technologies, versus coming from understanding from other areas [2].  I think inventions are also a good place to start because much of the processes here are driven by inspired tinkering [3], which I think maps well to where ML is at.  
 

The answer to both of these questions seems to be, well, both happen quite frequently.

The classic example of both revolutionary changes and greater scientific understanding coming from iterative improvements is in the development of steam engines and thermodynamics.  The first practical steam engine, Newcomen’s atmospheric engine, required knowledge of atmospheric pressure and vacuums: through cooling and condensing steam in a cylinder, a partial vacuum was created, the motion of the engine was driven by the weight of the atmosphere pushing the piston down into the partially evacuated cylinder.  However, knowledge of thermodynamics was still over a century off, even things like specific heat were unknown.  A century of incremental improvements gave rise to several revolutionary breakthroughs, such as Watt’s separate condenser, where Watt realized repeatedly heating and cooling water was inefficient, and it was better to separate the stages.  This took quite a bit of engineering to get right, and he finagled a patent that blocked the next incremental revolutionary advance: high-pressure steam engines, where high-pressure steam was used to drive the engine.  This also required a lot of engineering effort to work, without improvement in metallurgy, boring, and precision engineering a high-pressure steam engine would have been very difficult to build.  Carnot, widely credited with founding thermodynamics, developed his ideal model of heat engines from preexisting engines.  Understanding thermodynamics opened up a new paradigm for engine building, and allowed the construction of, say, Diesel's diesel engine

An example of the opposite would be the development of medicine and public health.  Initial early modern attempts to improve health had some real breakthroughs, the cowpox vaccine was revolutionary, several preventatives to scurvy were found (then many forgotten because the underlying principle wasn’t grasped), careful record-keeping indicated the importance of clean water, and so on.  But without an understanding of germ theory, possible advances, in vaccines, antibiotics, and sanitation were limited.  Only after germ theory was developed, relying heavily on studies of microbiology, and much less on the theory of miasma or random knowledge that sauerkraut prevents scurvy, were these problems overcome and a new paradigm available.  

There are many examples somewhere in between too, where incremental improvements were either somewhat important for understanding, or understanding was mixed with incremental improvements.  I like Jason Crawford’s example of the invention of a transistor, where initial theories were tested and proven wrong.  The next round of tinkering with implementations would drive the next theory, and so on until they converged.

Something like chemistry probably falls in a similar bucket.  A lot of early discoveries were driven through random discoveries driving further experimentation, sometimes they were quite random, such as the first modern synthetic dye discovered by a Brit trying to make artificial fertilizer.  Rubber was also first synthesized through this type of experimentation.  But after a point (mostly German) discoveries of underlying chemical structures were necessary to drive the process forward.  Early chemistry came from a hodgepodge of alchemy, metallurgy, and so on, but if we just had, say, dye experiments getting there would have certainly taken much longer.  Once some chemistry was known, people could create more, such as Leo Baekeland’s discovery of bakelite, was the major breakthrough in synthetic materials.  Yet macromolecular chemical theories lagged behind this discovery, and could only explain it in the 1920s.  There was a leapfrog effect between incremental improvements and understanding, as each drove the other

Other sectors like transportation were driven by a mix of incremental improvements and breakthroughs in other fields.  Trains were probably more micro the macro invention but were certainly revolutionary.  Tracks had been placed in mines starting as early as the 12th century, initially made of wood, with wheeled people powered carts. The development of cast iron made making iron tracks much more economical, while the development of steam engines gave trains the ability to propel themselves.  More recent developments of trains, such as bullet trains in Japan followed a similar process of piecing together previously existing inventions, to the point when Japan was first trying to float a loan to fund it their officials were able to successfully argue that it wasn’t a new invention, and the World Bank could legally fund it (as they were only allowed to fund infrastructure development, not research).

A rather worrying trend is that, while development after macro or revolutionary inventions is quite fast, it can take a long time to get between them.  The initial idea that science could revolutionize the world was thought of quite early in the scientific revolution, Francis Bacon was writing utopian fiction in the 1620s, but it took until the late 1800s for his vision to be proven even partially true.  From the discovery of the cowpox vaccine to germ theory was 75 years, from Newcomen’s engine to Watt’s first steam engine was ~50 years, to even the start of thermodynamics was over a century.  Things have sped up somewhat, but it still takes time.

I checked if there was a study to back up my intuition here, and there is.  AI Impact’s  study of patterns around technological advancements: “Many big, famous, impressive, and important advances are preceded by lesser-known, but still notable advances. These lesser-known advances often follow a long period of stagnation”, but breakthroughs are followed by a boom of further discovery and improvement.  As we are preparadigmatic for alignment and in a post-revolutionary boom phase in capabilities, from a historical perspective we look as screwed as we do from any other one.

Given that breakthroughs in other fields have been quite important, I think it would make sense to develop any alignment adjacent areas that don’t increase the speed of capability development (certain types of mathematics? Not really sure here).

My own intuition is that we are screwed enough it is worth risking some incremental gains that can benefit capabilities, even though they might lead to a revolutionary breakthrough and kill us all, just because we are too far behind and in other fields at least, incremental progress can lead to new paradigms.  But we need to be aware this is a very real risk and take steps to mitigate it.  My instinct is a PASTA-esque tool to either automate or help us do alignment work, it might be a necessary risk, perhaps more interpretability work too, even if it does make transformers stronger.  I’ve only been alignment pilled for the past couple of months though, so take this with a massive pile of salt.

I got a lot of the details in this from Mokyr’s “Lever of Riches”, he has a nice summary of the development of second industrial revolution technology as well.  More modern stuff comes from the two part series “Creating the Twentieth Century” and “Transforming the Twentieth Century” by Smil.  If you think this approach is valuable at all, it might also be worth either taking a more systematic look this, or talking to an actual historian as well?

  1.  For example, a lot of the work on paradigm shifts in Kuhn and later work, such as “The Invention of Science” by Wootton, most of the focus is on why paradigm shifts happen, what caused the Scientific Revolution to happen, etc.
  2. Following Anton Howes
  3. Scientific understanding was increasingly important through the second industrial revolution, and it is likely many fields would have it a dead-end without it.  
[-][anonymous]2y10

I would suggest you enumerate the bullet points.

Personally, I'm inclined to believe we live in world 2). I expect most people on LessWrong to believe 4) and 5). 3) seems to be orthogonal to the other claims.