Overall, I think the issue of causal confusion and OOD misgeneralisation is much more about capabilities than about alignment, especially if we are talking about the long-term x-risk from superintelligent AI, rather than short/mid-term AI risk.
OOD misgeneralisation is absolutely inevitable, due to Gödel's incompleteness of the universe and the fact that all the systems that evolve on Earth generally climb up in complexity. Whenever there is a new invention, such as money, interne... (read more)
My current favourite notion of agency, primarily based on Active Inference, which I refined upon reading "Discovering Agents", is the following:
Agency is a property of a physical system from some observer’s subjective perspective. It stems from the observer’s generative model of the world (including the object in question), specifically whether the observer predicts the agent's future trajectory in the state space by assuming that the agent has its own generative model which the agent uses to act. The agent's own generative model also depends on (adapts to... (read more)