I guess this falls into the category of "Well, we’ll deal with that problem when it comes up", but I'd imagine when a human preference in a particular dilemma is undefined or even just highly uncertain, one can often defer to other rules like--rather than maximize an uncertain preference, default to maximizing the human's agency, in scenarios where preference is unclear, even if this predictably leads to less-than-optimal preference satisfaction.
Still working my way through reading this series--it is the best thing I have read in quite a while and I'm very grateful you wrote it!
I feel like I agree with your take on "little glimpses of empathy" 100%.
I think fear of strangers could be implemented without a steering subsystem circuit maybe? (Should say up front I don't know more about developmental psychology/neuroscience than you do, but here's my 2c anyway). Put aside whether there's another more basic steering subsystem circuit for agency detection; we know that pretty early on, through some combination of instinct and learning from scratch, young humans and many animals learn there are agents in the world who move in ways that don't conform to the simple rules of physics they are learning. These agents seem to have internally driven and unpredictable behavior, in the sense their movement can't be predicted by simple rules like "objects tend to move to the ground unless something stops them" or "objects continue to maintain their momentum". It seems like a young human could learn an awful lot of that from scratch, and even develop (in their thought generator) a concept of an agent.
Because of their unpredictability, agent concepts in the thought generator would be linked to thought assessor systems related to both reward and fear; not necessarily from prior learning derived from specific rewarding and fearful experiences, but simply because, as their behavior can't be predicted with intuitive physics, there remains a very wide prior on what will happen when an agent is present.
In that sense, when a neocortex is first formed, most things in the world are unpredictable to it, and an optimally tuned thought generator+assessor would keep circuits active for both reward or harm. Over time, as the thought generator learns folk physics, most physical objects can be predicted, and it typically generates thoughts in line with their actual beahavior. But agents are a real wildcard: their behavior can't be predicted by folk physics, and so they perceived in a way that every other object in the world used to be: unpredictable, and thus continually predicting both reward and harm in an opponent process that leads to an ambivalent and uneasy neutral. This story predicts that individual differences in reward and threat sensitivity would particularly govern the default reward/threat balance otherwise unknown items. It might (I'm really REALLY reaching here) help to explain why attachment styles seem so fundamentally tied to basic reward and threat sensitivity.
As the thought generator forms more concepts about agents, it might even learn that agents can be classified with remarkable predictive power into "friend" or "foe" categories, or perhaps "mommy/carer" and "predator" categories. As a consequence of how rocks behave (with complete indifference towards small children), it's not so easy to predict behavior of, say, falling rocks with "friend" or "foe" categories. On the contrary, agents around a child are often not indifferent to children, making it simple for the child to predict whether favorable things will happen around any particular agent by classifying agents into "carer" or "predator" categories. These categories can be entirely learned; clusters of neurons in the thought generator that connect to reward and threat systems in the steering system and/or thought assessor. So then the primary task of learning to predict agents is simply whether good things or bad things happen around the agent, as judged by the steering system.
This story would also predict that, before the predictive power of categorizing agents into "friend" vs. "foe" categories has been learned, children wouldn't know to place agents into these categories. They'd take longer to learn whether an agent is trustworthy or not, particularly so if they haven't learned what an agent is yet. As they grow older, they get more comfortable with classifying agents into "friend" or "foe" categories and would need fewer exemplars to learn to trust (or distrust!) a particular agent.
Hey Steve, I am reading through this series now and am really enjoying it! Your work is incredibly original and wide-ranging as far as I can see--it's impressive how many different topics you have synthesized.
I have one question on this post--maybe doesn't rise above the level of 'nitpick', I'm not sure. You mention a "curiosity drive" and other Category A things that the "Steering Subsystem needs to do in order to get general intelligence". You've also identified the human Steering Subsystem as the hypothalamus and brain stem.
Is it possible things like a "curiosity drive" arises from, say, the way the telenchephalon is organized, rather than from the Steering Subsystem itself? To put it another way, if the curiosity drive is mainly implemented as motivation to reduce prediction error, or fill the the neocortex, how confident are you in identifying this process with the hypothalamus+brain stem?
I think I imagine the way in which I buy the argument is something like "steering system ultimately provides all rewards and that would include reward from prediction error". But then I wonder if you're implying some greater role for the hypothalamus+brain stem or not.
Very late to the party here. I don't know how much of the thinking in this post you still endorse or are still interested in. But this was a nice read. I wanted to add a few things:
- since you wrote this piece back in 2021, I have learned there is a whole mini-field of computer science dealing with multi-objective reward learning, maybe centered around . Maybe a good place to start there is https://link.springer.com/article/10.1007/s10458-022-09552-y
- The shard theory folks have done a fairly good job sketching out broad principles but it seems to me the homeostatic regulation does a great job of modulating which values happen to be relevant at any one time-- Xavier Roberts-Gaal recently recommended "Where do values come from?" to me and that paper sketches out a fairly specific theory for how this happens (I think it might be that more homeostatic recalculation happens physiologically rather than neurologically, but otherwise buy what they are saying)
- Continue to think the vmPFC is relevant because different parts are known to calculate value of different aspects of stimuli; this can be modulated by state from time to time. a recent paper in this by Luke Chang & colleagues is a neural signature of reward
That's right. What I mainly have in mind is a vector of Q-learned values V and a scalarization function that combines them in some (probably non-linear) way. Note that in our technical work, the combination occurs during action selection, not during reward assignment and learning.
I guess whether one calls this "multi-objective RL" is semantic. Because objectives are combined during action selection, not during learning itself, I would not call it "single objective RL with a complicated objective". If you combined objectives during reward, then I could call it that.
re: your example of real-time control during hunger, I think yours is a pretty reasonable model. I haven't thought about homeostatic processes in this project (my upcoming paper is all about them!). Definitely am not suggesting that our particular implementation of "MORL" (if we can call it that) is the only or even the best sort of MORL. I'm just trying to get started on understanding it! I really like the way you put it. It makes me think that perhaps the brain is a sort of multi-objective decision-making system with no single combinatory mechanism at all except for the emergent winner of whatever kind of output happens in a particular context--that could plausibly be different depending on whether an action is moving limbs, talking, or mentally setting an intention for a long term plan.
Interesting. Is it fair to say that Mollick's system is relatively more "serial" with fewer parallelisms at the subcortical level, whereas you're proposing a system that's much more "parallel" because there are separate systems doing analogous things at each level? I think that parallel arrangement is probably the thing I've learned most personally from reading your work. Maybe I just hadn't thought about it because I focus too much on valuation and PFC decision-making stuff and don't look broadly enough at movement and other systems.
Apropos of nothing, is there any role for the visual cortex within your system?
I too am puzzled about why some people talk about "mPFC" and others talk about "vmPFC". I focus on "vmPFC", mostly because that's what people in my field talk about. "vmPFC" focuses much more on valuation systems. Theoretically I guess "mPFC" would also include the dorsomedial prefrontal cortex, which includes the anterior cingulate cortex, I guess some systems related to executive control, perhaps response inhibition (although that's usually quite lateral), perhaps abstract processing. Tends to be a bit of a decision-making homunculous of sorts :/ And then there's the ACC, whose role in various things is fairly well defined.
So maybe authors who talk about the mPFC aren't as concerned about distinguishing value processing from all those other things.