This post is a result of numerous discussions with other participants and organizers of the MIRI Summer Fellows Program 2019. It describes ideas that are likely already known by many researchers. However, given how often disagreements about slow/fast takeoffs come up, I believe there is significant value in making them common knowledge.
In Superintelligence, Nick Bostrom distinguishes between slow, medium, and fast AI takeoff scenarios (where the takeoff speed is measured by how much real-world time passes between the milestones of human-level AI (HLAI) and superintelligent AI (SAI)).
He argues that slow takeoff should be reasonably safe since the humanity would have sufficient time coordinate and solve the AI alignment problem, while fast takeoff would be particularly dangerous since we wouldn't be able to react to what the AI does.
In many scenarios, the real-time takeoff speed indeed strongly correlates with our ability to influence the outcome. However, we can also imagine many scenarios where this is not the case.
As an example, suppose we obtain HLAI by simulating humans in virtual environments, and that this procedure additionally fully preserves the simulated humans' alignment with humanity. Since this effectively increases the speed at which humanity operates, we might get a "fully controlled takeoff" even if the transition from HLAI to SAI only takes a few days of real-world time.
More generally, if our path to HLAI also increases the effectivity of humanity's efforts, the "effective time" we get between HLAI and SAI will scale accordingly. For example, this might be the case if we go the way of Iterated Distillation and Amplification or Comprehensive AI Services. Less controversially, suppose we automate most of the current programming tasks and increase the re-usability of code, such that every computer scientist becomes 100-times as effective as they are now.
Given these examples, I think we should measure takeoff speeds not in real-world time, but rather in (some operationalization of) the work-towards-AI-alignment that humanity will be able to do between HLAI and SAI. Anecdotal examples of such measures might include "integral of the human-originating GDP between HLAI and SAI" or "number of AI safety papers published between HLAI and SAI".
I believe that finding a non-anecdotal operationalization would benefit many AI policy/strategy discussions.
Recall that Bostrom distinguishes between speed, collective, and quality superintelligence. Arguably, being able to simulate humans (with enough compute) already constitutes a speed superintelligence. However, I don't think this diminishes the overall point of the post. ↩︎
This is am interesting take on framing Takeoff Dynamic Timelines -- and I understand why it became your emphasis -- the ability to align the AI is the outcome that really matters in that situation.
That said, I see two major challenges with defining time through any form of operationalization standards:
Without either a clear operational input (capacity and effectiveness will likely fluctuate wildly over the coming years / decades) or clear operational output (we do not have a clear sense of the amount of input to get our desired outcome), its not a highly effective measure.
Specifically, not knowing either of these would lead to the conclusion that "we want more clock time" and perhaps more "crisis time" specifically, to maximize our operational inputs and thus greatest probability of a successful alignment outcome.--All that said, I think you hit on a key idea here — we need to make sure that we are measuring and tracking what actually matters: our ability to safely align the AI in question.
Curious if you think I missed something here?