Why would they suddenly start having thoughts of taking over, if they never have yet, even if it is in the training data?
This is the crux of the disagreement with the Bostrom/Yudkowsky crowd. Your syllogism seems to be
1 AGI will be an LLM
2 current LLMs don't exhibit power-seeking tendencies
3 the current training paradigm seems unlikely to instill power-seeking
4 therefore, AGI won't be power-seeking
I basically agree with this until step 4. Where I (and I think I can speak for the Bostrom/Yud crowd on this) diverge is that all of this is just evidence that current LLMs aren't AGI.
I think people are extrapolating way too much from the current state of AI alignment to what AGI alignment will be like. True AGI, and especially ASI, will be a dramatic phase change with reliably different characteristics from current LLMs.
This is the crux of the disagreement with the Bostrom/Yudkowsky crowd. Your syllogism seems to be
1 AGI will be an LLM
2 current LLMs don't exhibit power-seeking tendencies
3 the current training paradigm seems unlikely to instill power-seeking
4 therefore, AGI won't be power-seeking
I basically agree with this until step 4. Where I (and I think I can speak for the Bostrom/Yud crowd on this) diverge is that all of this is just evidence that current LLMs aren't AGI.
I think people are extrapolating way too much from the current state of AI alignment to what AGI alignment will be like. True AGI, and especially ASI, will be a dramatic phase change with reliably different characteristics from current LLMs.
(Relevant Yudkowsky rant: https://x.com/ESYudkowsky/status/1968414865019834449)