tl;dr: Some subagents are more closely managed, which makes them to an extent instruments of the superagent, giving rise to what looks like instrumental/terminal goals. Selection on trust avoids the difficulties that normally come with this, like inability to do open-ended truth-seeking and free-ranging agency.
(reply to Richard Ngo on the confused-ness of Instrumental vs Terminal goals that seemed maybe worth a quick top-level post based on @the gears to ascension saying this seemed like progress in personal comms)
Managed vs Unmanaged Agency captures much of the Instrumental vs Terminal Goal categorization
The structure Instrumental vs Terminal was pointing to seems better described as Managed vs Unmanaged Goal-Models. A cognitive process will often want to do things which it doesn't have the affordances to directly execute on given the circuits/parts/mental objects/etc it has available. When this happens, it might spin up another shard of cognition/search process/subagent, but that shard having fully free-ranging agency is generally counterproductive for the parent process.
Managed vs Unmanaged is not a binary, like terminal vs instrumental was, but it is a spectrum with something vaguely bimodal going on from what I observe.
Example - But it's hard to get the Coffee if someone is Managing you who doesn't want you to get Coffee
Imagine an agent which wants to Get_Caffeine(), settles on coffee, and runs a subprocess to Acquire_Coffee() — but then the coffee machine is broken and the parent Get_Caffeine() process decides to get tea instead. You don't want the Acquire_Coffee() subprocess to keep fighting, tooth and nail, to make you walk to the coffee shop, let alone start subverting or damaging other processes to try and make this happen!
But that's the natural state of unmanaged agency! Agents by default will try to steer towards the states they are aiming for, because an agent is a system that models possible futures and selects actions based on the predic