Generalised models: imperfect morphisms and informational entropy

by Stuart Armstrong11 min read9th Jul 2021No comments

6

Logic & Mathematics Probability & StatisticsWorld Modeling
Frontpage

I've defined generalised models as being given by , a set of features, and an (unnormalised) probability distribution over , the set of possible worlds defined by the all values of those features.

To make these into a category, a morphism between and was defined to be a relation between and (ie a subset of ), that obeyed certain conditions with respect to the 's.

If obeys those conditions, we can construct the "underlying model" for the morphism, a generalised model , with non-zero only on .

That underlying model result basically says:

  • There is an underlying reality of which and are different, consistent, facets.

That's well and good, but we also want to allow imperfect correspondences, where or (or both) might be known to be in error. After all, when we transitioned from Newtonian mechanics to relativity, it wasn't because there was an underlying reality that both were facets of. Instead, we realised that Newtonian mechanics and relativity were approximately though not perfectly equivalent in low-energy situations, and that when they diverged, relativity was overall more accurate.

We'd want to include these cases as generalised model, and measure how "imperfect" the morphism between them is, i.e. how much and diverge from being in perfect correspondence. We'll also look at how the and the feature sets are related - how much information caries relative to .

Imperfect morphisms

So we'll loosen the definition of "morphisms" so that the and need not correspond exactly to each other or an underlying reality. If we have a relation between and , here are different -consistency requirements that we could put on to make it into a morphism (the previous requirement has been renamed "-preserving", condition 5):

  1. General binary relations ; no connection assumed between and the s.
  2. -relational: for all with , there exists at least one with both and .
  3. -functional: for all with , there exists a unique with both and .
  4. -birelational: and are both -relational.
  5. -preserving: the same conditions as presented here. For all and , and .
  6. -isomorphic: is -preserving; both and are -functional.

The general results about these morphisms classes are (proof found in this footnote[1]):

  • All the above conditions are associative (hence define classes of morphisms for different categories with the same objects): if and are -relational, -functional, -relational, -birelational, -preserving, -isomorphic or disconnected from entirely, then so is .
  • Every morphism fulfils the conditions of the morphisms above it on the list, except that -preserving and -birelational need not imply -functional.
  • If is -isomorphic, then we can pair up each non-measure zero elements and so that and .

Examples of morphisms

Here are four relations:

A coarsening is when multiple worlds get related to a single world, thus losing details. A refinement is the opposite: a single world gets related to multiple worlds, thus adding more details. An inclusion adds more worlds to the set. Its opposite, a restriction, removes worlds.

In terms of morphisms, if we assume that all the worlds in those sets have non-zero probability, then coarsenings, refinements, and inclusions are all -relational and -functional. Restrictions are neither. Coarsenings and refinements are -birelational, while inclusions and restrictions are not.

As for -preserving, coarsenings are -preserving if, for all , the sum of for all , is equal to . Similarly, refinements are -preserving if, for all , the sum of the for all , is equal . None of these four relations are -isomorphic (unless some of the worlds in the diagrams are of measure ).

What about Bayesian updates? If we start with and want to update on for some constant , then this corresponds to relating to such that if . We'll also require on any such worlds - and assume that that defines entirely (we're ignoring renormalisation here, since we don't assume that and have measure ).

This is clearly a restriction, and hence does not meet any of the -consistency conditions. However, we can define Bayesian updates as relations such that is -functional and injective. They do form a category when seen this way (since ).

Comparing the 's

Given two generalised models and , with a relation between them, we'll now compare and . We'll do this by defining a length operator that gives the "length" of , which is a measure of the divergence between and "along" the relation .

Let be a pair of probability distributions on and , respectively. We'll say the pair is -compatible if is a -preserving morphism between and .

Since and are distributions over the same , we can compare their norm, defined as: . Then define as the sum of the -distances to and :

We'll define to be the minimum[2] value of this norm among all the -compatible . Because of its definition, it's immediately obvious that . The other key properties - proved here[3] - are that:

  • If is -relational, then there exists a such that ; ie we can use itself rather than finding a .
  • If and are -birelational, the is a sensible length operator, in that .
  • is -preserving iff .

It's that last property that makes such a useful distance metric: it measures the extent to which and fail to be aspects of the same underlying reality.

Relating features and probability distributions

In the above we've been looking at the relationship between and , but have looked little at the features. Here we'll look at some of the relations between features and probability distributions. The idea is to measure how related the features are to each other.

There are several candidates for a measure of this types, but the most interesting seems to be a generalisation of mutual information. For any feature , we have the marginal distribution of over the feature . Then if is the informational entropy of a probability distribution/random variable, we can define a measure over given as:

If we define as the product distribution , then that can also be defined as , where is the KL-divergence from to .

Example: gas laws

As an illustration, consider the ideal gas laws: , where is pressure, is volume, is the amount of substance, is the ideal gas constant, and is the temperature. We'll consider a simple model with constant amount of substance, setting , so the law reduces to:

We'll allow these variables to take a few values: and are integers that range from to , while is an integer that ranges from to . The probability distribution will give uniform probability[4] to each .

In this case, is uniform among the worlds where it is non-zero; hence

This characterises , but doesn't touch the features . So, let , , and be the marginal distributions over the features. Both and are uniform over elements, so . As for , it is over , over , and over . Some calculations then establish that is . Then the KL-divergence from to is:

Let us add another variable to the feature set, which is just equal to , but with another name, and see how things change. Then is unchanged, and adds another , corresponding to .


  1. We've already shown that -preserving morphisms are associative. The composition of general binary relations is known to be associative too.

    So now assume that and are both -relational. Let be such that . Then, because r is -relational, there exists a with and . Then since is -relational, there exists a with and . Combining the two gives . Thus -relational morphisms are associative. Applying the same argument to shows that -birelational morphisms are also associative. A variant of the same argument with "there exists a unique " instead of "there exists a " shows that -functional morphisms are also associative. Hence -isomorphic are also functional.

    Now assume that is -preserving and let be such that . Then there exists an underlying model such that . Since this sum is greater than zero, there exists a such that . Then since and all the terms are non-negative, . The same argument works if we started with a such that ; thus must be -birelational. This proves that -preserving implies -birelational.

    It's trivial that -birelational implies -relational, and that -functional also implies -relational, since "exists a unique" is strictly stronger than "exists a". By definition, -isomorphic implies -preserving (hence -birelational and -relational) and -functional.

    Now let be -isomorphic, and let be such that . Since is -functional, there exists a unique with and . Since is also -functional, there are no other with and . So, among the elements of non-zero measure, and are related only to each other. Then since is -preserving, , and . Hence . ↩︎

  2. This will be a minimum, not an infimum. Let be an -compatible pair with . Then if we restrict to pairs with , this is a compact non-empty set, so must reach its minimum on this set. ↩︎

  3. For a relation between and , let be the set of -compatible that minimises . Hence on .

    So we have this non-empty ; what we'll show is that, if is -relational, then there is a such that . We'll need the following lemma:

    • Lemma A: For any in with , we can find another pair with closer (in the norm) to than is.

    To prove the lemma, pick any with . That norm is a sum of positive terms, so there must exist a with .

    Assume first that . Then, since and is a -preserving morphism between and , there is an underlying model so that . Thus there is a with . Pick to be less than and , and define:

    with and on all other points. Because is closer to (by ) than is, . Furthermore, is -preserving between and (the underlying model has the same except increased by on ), and has gone up by at most over ; thus the sum has not increased. Hence .

    Now consider the other case: . Then since and is -relational, there must exist a . We define and as above, except that we add instead of subtracting it; the rest of the argument is the same. This proves the lemma .

    Back to the main proof. Since is compact, the distance to must reach a minimum on . By lemma A, this minimum can only be (since if it were greater than , we could find a pair with a smaller distance to ). If is a pair on which it reaches this minimum, we must have , ie .

    Now assume is -birelational between and , while is -birelational between and . Then . Since and are -relational, there exists and such that this is , and is -compatible, while is -compatible.

    This implies that is -compatible, and thus . However, , giving our result:

    Finally, notice that implies that there exists an compatible with and . Thus themselves are -compatible, ie is -preserving. Conversely, if are -compatible, then , and thus . ↩︎

  4. Note that this means that many values of are impossible in this model, such as and , which cannot be expressed as the product of integers or less. ↩︎

6

New Comment