Zach Stein-Perlman

AI forecasting & strategy at AI Impacts. Blog: Not Optional.


Sorted by New

Wiki Contributions


Huh, I claim Ajeya's timelines are much more coherent if we replace 2026 with 2027.5 or 2028.* 10% between now and 2026, then 5% between 2026 and 2030, then 20% between 2030 and 2036 is really weird.

*Changing 2026 (rather than 2030) just because Ajeya's 2026 cumulative probability seems less considered than her 2030 and 2036 cumulative probabilities.

(+1. I totally agree that input growth will slow sometime if we don't get TAI soon. I just think you have to be pretty sure that it slows right around 2040 to have the specific numbers you mention, and smoothing out when it will slow down due to that uncertainty gives a smoother probability distribution for TAI.)

Good post!

I understand that the specific numbers in this post are "rough" and "volatile," but I want to note that 35% by 2036, 50% by 2040, and 60% by 2050 means 3.75% per year 2036–2040 and 1% per year 2040–2050, which is a surprisingly steep drop-off. Or as an alternative framing, conditional on TAI not having appeared by 2040, my expected credence in 2040 that TAI appears in the next 10 years is much greater than 20% (where 20% is your implied probability of TAI between 2040 and 2050, conditional on no TAI in 2040). My median timeline is somewhat shorter than yours, but my credence in TAI by 2050 is substantially higher.

(That said, I lack something like the knowledge, courage, or epistemic virtue to be more explicit about my timelines, because it's hard; strong-upvote for this useful and virtuous post, and thanks for using specific numbers so much.)

How optimistic should we be about alignment & safety for brain-like-AGI, relative to prosaic AGI?

Ask dumb questions! ... we encourage people to ask clarifying questions in the comments of this post (no matter how “dumb” they are)

ok... disclaimer: I know little about ML and I didn't read all of the report.

All of our counterexamples are based on an ontology mismatch between two different Bayes nets, one used by an ML prediction model (“the predictor”) and one used by a human.

I am confused. Perhaps the above sentence is true in some tautological sense I'm missing. But in the sections of the report listing training strategies and corresponding counterexamples, I wouldn't describe most counterexamples as based on ontology mismatch. And the above sentence seems in tension with this from the report:

We very tentatively think of ELK as having two key difficulties: ontology identification and learned optimization. ... We don’t think these two difficulties can be very precisely distinguished — they are more like genres of counterexamples

So: do some of your training strategies work perfectly in the nice-ontology case, where the model has a concept of "the diamond is in the room"? If so, I missed this in the report and this feels like quite a strong result to me; if not, there are counterexamples based on things other than ontology mismatch.

Ha, I wrote a comment like yours but slightly worse, then refreshed and your comment appeared. So now I'll just add one small note:

To the extent that (1) normatively, we care much more about the rest of the universe than our personal lives/futures, and (2) empirically, we believe that our choices are much more consequential if we are non-simulated than if we are simulated, we should in practice act as if there are greater odds that we are non-simulated than we have reason to believe for purely epistemic purposes. So in practice, I'm particularly interested in (C) (and I tentatively buy SIA doomsday as explained by Katja Grace).

Edit: also, isn't the last part of this sentence from the post wrong:

SIA therefore advises not that the Great Filter is ahead, but rather that we are in a simulation run by an intergalactic human civilization, without strong views on late filters for unsimulated reality.