I think this came up in the previous discussion as well that a AI that was able to competently design a nanofactory could have the capability to manipulate humans as at a high level as well. For example:
Then when the system generalizes well enough to solve domains like "build a nanosystem" - which, I strongly suspect, can't be solved without imaginative reasoning because we can't afford to simulate that domain perfectly and do a trillion gradient descent updates on simulated attempts - the kind of actions of thoughts you can detect as bad, that might have provided earlier warning, were trained out of the system by gradient descent; leaving actions and thoughts you can't detect as bad.
Even within humans, it seems we have people e.g on the autistic spectrum etc, who I can imagine as having the imaginative reasoning & creativity required to design something like a nano-factory(at 2-3 SD above the normal human) while also being 2-3SD below the average human in manipulating other humans. At least it points to those 2 things maybe not being the same general-purpose cognition or using the same "core of generality"
While this is not by-default guaranteed in the first nanosystem-design capable AI system, it seems like it shouldn't be impossible to do so with more research.