Nick Bostrom came up with the idea of a treacherous turn for smart AIs.
while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values.
Ben Goertzel criticised this thesis, pointing out that:
for a resource-constrained system, learning to actually possess human values is going to be much easier than learning to fake them. This is related to the everyday observation that maintaining a web of lies rapidly gets very complicated.
This argument has been formalised into the sordid stumble:
An AI that lacks human desirable values will behave in a way that reveals its human-undesirable values to humans before it gains the capability to deceive humans into believing that it has human-desirable values.