For instance, a money-maximising trade-bot AI could be perfectly safe if it notices that money, in its initial setting, is just a proxy for humans being able to satisfy their preferences.
There is a critical step missing here, which is when the trade-bot makes a "choice" between maximising money or satisfying preferences.At this point, I see two possibilities:
So I'd be focusing on "do the goals stay safe as the AI gains situational awareness?", rather than "are the goals safe before the AI gains situational awareness?"
This is a false dichotomy. Assuming that when the AI gains situational awareness, it will optimize for its developers' goals, alignment is already solved. Making the goals safe before situational awareness is not that hard: at that point, the AI is not capable enough for X-risk.(A discussion of X-risk brought about by situationally unaware AIs could be interesting, such as a Christiano failure story, but Soares's model is not about it, since it assumes autonomous ASI.)