Mateusz Bagiński

~[agent foundations]

Wiki Contributions


I wonder how the following behavioral patterns fit into Shard Theory

  • Many mammalian species have strong default aversion to young of their own species. They (including females) deliberately avoid contact with the young and can even be infanticidal. Physiological events associated with pregnancy (mostly hormones) rewires the mother's brain such that when she gives birth, she immediately takes care of the young, grooms them etc., something she has never done before. Do you think this can be explained by the rewiring of her reward circuit such that she finds simple actions associated with the pups highly rewarding and then bootstraps to learning complex behaviors from that?
  • Salt-starved rats develop an appetite for salt and are drawn to stimuli predictive of extremely salty water, even though on all previous occasions they found it extremely aversive, which caused them to develop conditioned fear response to the cue predictive of salty water. (see Steve's post on this experiment

Standardized communication protocols

Language is the most obvious example, but there's plenty of others. E.g. taking different parts of the body as subsystems communicating with each other, one neurotransmitter/hormone often has very similar effects in many parts of the body.

In software, different processes can communicate with each other by passing messages having some well-defined format. When you're sending an API request, you usually have a good idea of what shape the response is going to take and if the request fails, it should fail in a predictable way that can be harmlessly handled. This makes making reliable software easier.

Some cases of standardization are spontaneous/bottom-up, whereas others are engineered top-down. Human language is both. Languages with greater number of users seem to evolve simpler, more standardized grammars, e.g. compare Russian to Czech or English to Icelandic (though syncretism and promiscuous borrowing may also have had an impact in the second case). I don't know if something like that occurs at all in programming languages but one factor that makes it much less likely is the need to maintain backward-compatibility, which is important for programing languages but much weaker for human languages.

I don't have a specific example right now but some things that come to mind:

  • Both utility functions ultimately depend in some way on a subset of background conditions, i.e. the world state
  • The world state influences the utility functions through latent variables in the agents' world models, to which they are inputs.
  • changes only when (A's world model) changes which is ultimately caused by new observations, i.e. changes in the world state (let's assume that both A and B perceive the world quite accurately).
  • If whenever changes doesn't decrease, then whatever change in the world increased , B at least doesn't care. This is problematic when A and B need the same scarce resources (instrumental convergence etc). It could be satisfied if they were both satisficers or bounded agents inhabiting significantly disjoint niches.
  • A robust solution seems seems to be to make (super accurately modeled) a major input to .

Can't you restate the second one as the relationship between two utility functions and such that increasing one (holding background conditions constant) is guaranteed not to decrease the other? I.e. their respective derivatives are always non-negative for every background condition.

Re second try: what would make a high-level operationalisation of that sort helpful? (operationalize the helpfulness of an operationalisation)