When constructing a high-level abstract causal DAG from a low-level DAG, one operation which comes up quite often is throwing away information from a node. This post is about how to do that.
First, how do we throw away information from random variables in general? Sparknotes:
For more explanation of this, see Probability as Minimal Map.
For our purposes, starting from a low-level causal DAG, we want to:
Here denotes all the node indices outside . (Specifying rather than directly will usually be easier in practice, since is usually a small neighborhood of nodes around .) In English: we want to throw away information from , while retaining all information relevant to nodes outside the set .
Two prototypical examples:
In both examples, we’re throwing out “local” information, while maintaining any information which is relevant “globally”. This will mean that local queries - e.g. the voltage in one wire given the voltage in a neighboring wire at the same time - are not supported; short-range correlations violate the abstraction. However, large-scale queries - e.g. the voltage in a wire now given the voltage in a wire a few seconds ago - are supported.
We still have one conceptual question to address: when we replace by , how do we modify children nodes of to use instead?
The first and most important answer is: it doesn’t matter, so long as whatever they do is consistent with . For instance, suppose ranges over {-1, 0, 1}, and . When , the children can act as though were -1 or 1 - it doesn’t matter which, so long as they don’t act like . As long as the childrens’ behavior is consistent with the information in , we will be able to support long-range queries.
There is one big catch, however: the children do need to all behave as if had the same value, whatever value they choose. The joint distribution (where = children of and = spouses of ) must be equal to for some value consistent with . The simplest way to achieve this is to pick a particular “representative” value for each possible value of , so that .
Example: in the digital circuit case, we would pick one representative “high” voltage (for instance the supply voltage ) and one representative “low” voltage (for instance the ground voltage ). would then map any high voltages to and any low voltages to .
Once we have our representative value function , we just have the children use in place of .
If we want, we could even simplify one step further: we could just choose to spit out representative values directly. That convention is cleaner for proofs and algorithms, but a bit more confusing for human usage and examples.