First things first: this post is about timelines (i.e. what year we first get transformative AGI), not takeoff speeds (i.e. foom vs gradual takeoff).

Claim: timelines are mostly not strategically relevant to alignment research, i.e. deciding what to work on. Why? Because at any given time, it would take ~18 months to take whatever our current best idea is, implement it, do some basic tests, and deploy it. (Really it probably takes less than 6 months, but planning fallacy and all that.) If AGI takeoff is more than ~18 months out, then we should be thinking “long-term” in terms of research; we should mainly build better foundational understanding, run whatever experiments best improve our understanding, and search for better ideas. (Note that this does not necessarily mean a focus on conceptual work; a case can be made that experiments and engineering feedback are the best ways to improve our foundational understanding.)

What about strategic decisions outside of object-level research? Recruitment and training strategies for new researchers might depend on how soon our investments need to pay off; do we look for a brilliant young person who will need three or five years of technical study before they’re likely to make any important progress, or a more experienced person who can make progress right now but is probably already near their skill ceiling? How much should existing technical researchers invest in mentoring new people? Those are questions which depend on timelines, but the relevant timescale is ~5 years or less. If AGI is more than ~5 years out, then we should probably be thinking “long-term” in terms of training; we should mainly make big investments in recruitment and mentorship.

General point: timelines are usually only decision-relevant if they’re very short. Like 18 months, or maybe 5 years for relatively long-term investments. The difference between e.g. 10 years vs 30 years vs 100 years may matter a lot for our chances of survival (and the difference may therefore be highly salient), but it doesn’t matter for most actual strategic decisions.

Meta note: there's a lot of obvious objections which I expect to address in the comments; please check if anyone has posted your objection already.

21

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 7:03 AM

I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to "will first dangerous models look like current models", which I think matters more for research directions than what you allow in the second paragraph.
  
For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.

Yup, I definitely agree that something like "will roughly the current architectures take off first" is a highly relevant question. Indeed, I think that gathering arguments and evidence relevant to that question (and the more general question of "what kind of architecture will take off first?" or "what properties will the first architecture to take off have?") is the main way that work on timelines actually provides value.

But it is a separate question from timelines, and I think most people trying to do timelines estimates would do more useful work if they instead explicitly focused on what architecture will take off first, or on what properties the first architecture to take off will have.

I think timelines are a useful input to what architecture takes off first. If the timelines are short, I expect AGI to look like something like DL/Transformers/etc. If timelines are longer there might be time for not-yet-invented architectures to take off first. There can be multiple routes to AGI, and "how fast do we go down each route" informs which one happens first.

Correlationally this seems true, but causally it's "which architecture takes off first?" which influences timelines, not vice versa.

Though I could imagine a different argument which says that timeline until the current architecture takes off (assuming it's not superseded by some other architecture) is a key causal input to "which architecture takes off first?". That argument I'd probably buy.

I definitely endorse the argument you'd buy, but I also endorse a broader one. My claim is that there is information which goes into timelines which is not just downstream of which architecture I think gets there first.

For example, if you told me that humanity loses the ability to make chips "tomorrow until forever" my timeline gets a lot longer in a way that isn't just downstream of which architecture I think is going to happen first. That then changes which architectures I think are going to get there first (strongly away from DL) primarily by making my estimated timeline long enough for capabilities folks to discover some theoretically-more-efficient but far-from-implementable-today architectures.