Breaking Down Goal-Directed Behaviour

When we speak about entities 'wanting' things, or having 'goal-directed behaviour', what do we mean?

Here I aim to take steps to break down 'goal-directed behaviour' into a conceptual framework of computational abstractions for which I offer tentative terminology, and which helps me to better understand and describe analogies and disanalogies between various goal-directed systems. The overarching motivation is to better understand goal-directed behaviour, in the sense of being able to better predict its (especially counterfactual and off-distribution) implications, its arisal, and other properties. Hopefully it is clear why I consider this worthwhile.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Breaking Down Goal-Directed Behaviour