David Udell

Wiki Contributions

Comments

I don't know, I think of the brain as doing credit assignment pretty well, but we may have quite different definitions of good and bad. Is there an example you were thinking of?

Say that the triggers for pleasure are hardwired. After a pleasurable event, how do only those computations running in the brain that led to pleasure (and not those randomly running computations) get strengthened? After all, the pleasure circuit is hardwired, and can't reason causally about what thoughts led to what outcomes.

(I'm not currently confident that pleasure is exactly the same thing as reinforcement, but the two are probably closely related, and pleasure is a nice and concrete thing to discuss.)

What's to stop the human shards from being dominated and extinguished by the non-human shards? IE is there reason to expect equilibrium?

Nothing except those shards fighting for their own interests and succeeding to some extent.

You probably have many contending values that you hang on to now, and would even be pretty careful with write access to your own values, for instrumental convergence reasons. If you mostly expect outcomes where one shard eats all the others, why do you have a complex balance of values rather than a single core value?

And if it looks like this comes in hindsight by carefully reflecting on the situation, that's not without reinforcement. Your thoughts are scored against whatever it is that the brainstem is evaluating. And same as above, earlier or later, you stumble into some thoughts where the pattern is more clearly attributable, and then the weights change.

Maybe. But your subcortical reinforcement circuitry cannot (easily) score your thoughts. What it can score are the mystery computations that led to hardcoded reinforcement triggers, like sugar molecules interfacing with tastebuds. When you're just thinking to yourself, all of that should be a complete black-box to the brainstem.

I did mention that something is going on in the brain with self-supervised learning, and that's probably training your active computations all the time. Maybe shards can be leveraging this training loop? I'm currently quite unclear on this, though.