Do humans derive values from fictitious imputed coherence?

I don't recall seeing that theory in the first quarter of the book, but I'll look for it later. I somewhat agree with your description of the difference between the theories (at least, as I imagine a predictive processing flavored version). Except, the theories are more similar than you say, in that FIAT would also allow very partial coherentifying, so that it doesn't have to be "follow these goals, but allow these overrides", but can rather be, "make these corrections towards coherence; fill in the free parameters with FIAT goals; leave all the other incoherent behavior the way it is". A difference between the theories (though I don't feel I can pass the PP ITT) is that FIAT allows, you know, agency, as in, non-myopic goal pursuit based on coherent-world-model-building, whereas PP maybe strongly hints against that?

It seems like the thing to do is to look for cases where people pursue their own goals, rather than the goals they would predict they have based on past actions.

I'm confused by this; are these supposed to be mutually exclusive? What's "their own goals"? [After thinking more: Oh like you're saying, here's what it would look like to have a goal that can't be explained as a FIAT goal? I'll assume that in the rest of this comment.]

It needs to be complex enough to not plausibly be a reflex/instinct.

Agreed.

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal.

I'm not sure I buy that it can't be inferred, even the first time. Maybe you have fairly built-in instincts that aren't about the whole courtship thing, but cause you to feel good when you're around someone. So you seek being around them, and pay attention to them. You try to get them interested in being around you. This builds up the picture of a goal of being together for a long time. (This is a pretty poor explanation as stated; if this explanation works, why wouldn't you just randomly fall in love with anyone you do a favor for? But this is why it's at least plausible to me that the behavior could come from a FIAT-like thing. And maybe that's actually the case with homosexual intercourse in the 1800s.)

The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too".

Maybe courtship is especially much like this, but in general things sort-of-well-explainable as imitation seem like admissible falsifications of FIAT, e.g. if there are also pressures against the behavior.

[-]abramdemski3y20

FIAT (by another name) was previously proposed in the book On Intelligence. The version there had a somewhat predictive-processing-like story where the cortex makes plans by prediction alone; so reflective agency (really meaning: agency arising from the cortex) is entirely dependent on building a self-model which predicts agency. Other parts of the brain are responsible for the reflexes which provide the initial data which the self-model gets built on (similar to your story).

The continuing kick toward higher degrees of agency comes from parts of the brain which have reactions to the predictions made by the cortex. (Otherwise, the cortex just learns to predict the raw reflexes, and we're stuck imitating our baby selves or something along those lines).

It's not clear precisely how all of that works, but basically it means we have a pure predictive system (and much of the time we simply take the predicted actions), plus we have some other stuff (EG reflexes, and an override RLish system which inhibits and/or replaces the predicted action under some circumstances).

The most obvious version of FIAT which someone might write down after reading your post, otoh, is more like: run some IRL technique on your own past actions, and then (most of the time) plan based on the inferred goals, again with some overrides (built-in reflexes).

Anyway.

Here's my attempt to make a probably-false prediction from FIAT, as best I can.

A sort of plausible example is courtship. It's complex, it can't easily be inferred from previous things you did (not the first time you do it, that is), and it agentically orients toward a goal. The problem is, I think it's well-explained as imitation - "I'm a person; the people around me do this and seem really into it; so I infer that I'm really into it too".

So it's got to be a case where someone does something unexpected, even to themselves, which they don't see people do, but which achieves goals-they-plausibly-had-in-hindsight.

Homosexual intercourse in the 1800s??

Christopher Thomas Knight heading off into the woods??

[-]TsviBT3y10

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

19

Do humans derive values from fictitious imputed coherence?

19

The FIAT hypothesis

Built-in behavior-determiners

Some data

Redescriptions

Ambiguity

Questions