This was an amazing article, thank you for posting it!
Side tangent: There’s an annoying paradox that: (1) In RL, there’s no “zero of reward”, you can uniformly add 99999999 to every reward signal and it makes no difference whatsoever; (2) In life, we have a strong intuition that experiences can be good, bad, or neutral; (3) ...Yet presumably what our brain is doing has something to do with RL! That “evolutionary prior” I just mentioned is maybe relevant to that? Not sure … food for thought ...
The above isn't quite true in all senses in all RL algorithms. F... (read more)
That's interesting, thanks!
I agree that this is a very important dynamic. But I also feel like, if someone
says to me, "I keep a kitten in my basement and torture him every second of
every day, but it's no big deal, he must have gotten used to it by now", I mean,
I don't think that reasoning is correct, even if I can't quite prove it or put
my finger on what's wrong. I guess that's what I was trying to get at with that
"evolutionary prior" comment: maybe there's a hardcoded absolute threshold such
that you just can't "get used to" being tortured, and set that as your new
baseline, and stop actively disliking it? But I don't know, I need to think
about it more, there's also a book I want to read on the neuroscience of
pleasure and pain, and I've also been meaning to look up what endorphins do to
the brain. (And I'm happy to keep chatting here!)
I don't have a full explanation of comparing-to-baseline. At first I was gonna
say "it's just the reward-prediction-error thing I described: if you expect
candy based on your beliefs at 5:05:38, and then you no longer expect candy
based on your beliefs at 5:05:39, then that's a big negative reward prediction
error. (Because the reward-predictor makes its prediction based on
slightly-stale brain status information.) But that doesn't explain why maybe we
still feel raw about it 3 minutes later. Maybe it's like, you had this active
piece-of-a-thought "I'm gonna get candy", but it's contradicted by the other
piece-of-a-thought "no I'm not", but that appealing piece-of-a-thought "I'm
gonna get candy" keeps popping back up for a while, and then keeps getting
crushed by reality, and the net result is a bad feeling. Or something? I dunno.
Oh, I think there's also a thing where the brainstem can force the high-level
planner to think about a certain thing; like if you get poked on the shoulder
it's kinda impossible to ignore. I think I have an idea of what mechanism is
involved here … involving acetylcholine and how specific and con
This was an amazing article, thank you for posting it!
The above isn't quite true in all senses in all RL algorithms. F... (read more)