avatar

posted on 2022-09-16 — also cross-posted on lesswrong, see there for comments

(this post has been written for the third Refine blog post day)

ordering capability thresholds

given an AI which is improving towards ever more capabilities, such as by way of recursive self-improvement, in what order will it pass the following points?

throughout this post i'll be using PreDCA as an example of a formal goal to be maximized, because it appears to me as a potentially promising direction; but you can imagine adapting this post to other formal goals such as insulated goal-programs, or other alignment strategies altogether. we can even use this time-ordering framework to compare the various thresholds of multiple alignment strategies, though i won't do that here.

with a few notes:

one thing that can be noticed is that humans might serve as evidence. for example, we can examine history to figure out whether we passed Math or would've been able to pass PreDCA (given a reasonable description of it) before getting to Doom — my guess is yes at least for that latter one.

now, we can reasonably guess the following pieces of ordering, where as usual in ordering graphs X → Y means X < Y and transitive edges are not shown.

in addition, for any two quantities X < Y, it can be the case that they're pretty close in time X ≈ Y, or it can be that there's a bunch of time between them X ≪ Y. whether the threshold between those two possibilities is more like a day or a year, is gonna depend on context.

depending on how the rest of the ordering graph turns out and how close pairs of subsequent events are in time, we can be in a variety of situations:

finally, some claims that i strongly disbelieve in can still be expressed within this capabilities ordering framework, such as E ≪ D or that, given a theoretical maximum level of AI cabability Max, Max < Doom or even Max < DSA.

posted on 2022-09-16 — also cross-posted on lesswrong, see there for comments

unless explicitely mentioned, all content on this site was created by me; not by others nor AI.