*This is a longer, more mathematically dense introduction to "Finite Factored Sets." For a shorter introduction, see **my Topos talk/transcript**.*

*Abstract:* We propose a new approach to temporal inference, inspired by the Pearlian causal inference paradigm—though quite different from Pearl's approach formally. Rather than using directed acyclic graphs, we make use of *factored sets*, which are sets expressed as Cartesian products. We show that finite factored sets are powerful tools for inferring temporal relations. We introduce an analog of *d*-separation for factored sets, *conditional orthogonality*, and we demonstrate that this notion is equivalent to conditional independence in all probability distributions on a finite factored set.

# 1. Introduction

## 1.1. Pearlian Causal Inference

Judea Pearl's theory of inferred causation (e.g., as presented in chapter 2 of *Causality: Models, Reasoning, and Inference*) was a deep advance in our understanding of the nature of time. The Pearlian paradigm allows us to infer causal relationships between variables using statistical data, and thereby infer temporal sequence—in defiance of the old adage that correlation does not imply causation.

In particular, given *a collection of variables* and *a joint probability distribution over those variables*, the Pearlian paradigm can often infer temporal relationships between the variables.

The joint probability distribution is usually what gets emphasized in discussions of Pearl's approach. Quite a bit of work is being done, however, by the assumption that we are handed "a collection of variables" to reason about. The Pearlian paradigm is not inferring temporal relationships from purely statistical data, but rather inferring temporal relationships from statistical data together with data about how to factorize the world into variables.[1]

A doctor who misdiagnoses their patient or misidentifies a symptom may base their subsequent reasoning on a wrong factorization of the situation into causally relevant variables. We would ideally like to build fewer assumptions like this into our model of inference, and instead allow the reasoner to figure such facts out, consider the merits of different factorizations into variables, etc.

Instead of beginning with a collection of variables and a joint probability distribution over those variables, one could imagine starting with just a finite sample space and a probability distribution on that sample space. In this way, we might hope to do temporal inference purely using statistical data, without relying on *a priori* knowledge of a canonical way of factoring the situation into variables.

How might one do temporal inference without an existing factorization? One way might be to just consider all possible variables that can be defined on the sample space. This gives us one variable for each partition of the set.

However, when one tries to apply Pearl's methods to this collection of variables, one quickly runs into a problem: many of the variables definable on a fixed set are deterministic functions of each other. The Pearlian paradigm, as presented in the early chapters of *Causality*, lacks tools for performing temporal inference on variables that are highly deterministically related.[2]

We will introduce a new approach to temporal inference instead—one which is heavily inspired by the Pearlian paradigm, but approaches the problem with a very different formal apparatus, and does not make use of graphical models.

## 1.2. Overview

We'll begin by introducing the concept of a finite factored set, in Section 2. This will be our analogue of the directed acyclic graphs in Pearl's framework.

In Section 3, we will introduce the concepts of time and orthogonality, which can be read off of a finite factored set. In Pearl's framework, "time" corresponds to directed paths between nodes, and "orthogonality" corresponds to nodes that have no common ancestor.

In Section 4, we will introduce conditional orthogonality, which is our analogue of *d*-separation. We show that conditional orthogonality satisfies (a modified version of) the compositional semigraphoid axioms. We then (in Section 5) prove the fundamental theorem of finite factored sets, which states that conditional orthogonality is equivalent to conditional independence in all probability distributions on the finite factored set.

In Section 6, we discuss how to do temporal inference using finite factored sets, and give two examples. Finally, in Section 7 we discuss applications and future work, with an emphasis on temporal and conceptual inference, generalizing finite factored sets to the infinite case, and applications to embedded agency.

And here, we take our leave of Pearl. We've highlighted this approach's relationship to the Pearlian paradigm in order to motivate finite factored sets and explain how we'll be using them in this sequence. Formally, however, our approach is quite unlike Pearl's, and the rest of the sequence will stand alone.

# 2. Factorizations

Before giving a definition of finite factored sets, we will recall the definition of a partition, and give some basic notation related to partitions.

We do this for two reasons. First, we will use partitions in the definition of a factored set; and second, we want to draw attention to a duality between the notion of a partition, and the notion of a factorization.

## 2.1. Partitions

We begin with a definition of disjoint union.

**Definition 1** (disjoint union). *Given a set ** of sets, let ** denote the set of all ordered pairs **, where ** and **.*[3]

**Definition 2** (partition). *A partition of a set ** is a set ** of nonempty subsets of ** such that the function ** given by ** is a bijection.*[4]

*Let ** denote the set of all partitions of **. The elements of a partition are called parts.*

An equivalent definition of partition is often given: a partition is a set of nonempty subsets of that are pairwise disjoint and union to . We choose the above definition because it will make the symmetry between partitions and factorizations more obvious.

**Definition 3 **(trivial partition). *A partition ** of a set ** is called trivial if **.*

**Definition 4**. *Given a partition ** of a set **, and an element **, let ** denote the unique ** such that **.*

**Definition 5**. *Given a partition ** of a set **, and elements **, we say ** if **.*

**Proposition 1. ***Given a partition ** of a set **, ** is an equivalence relation on **.*

*Proof. *Trivial.

**Definition 6** (finer and coarser). *We say that a partition ** of ** is finer than another partition ** of **, if for all **, if **, then **.*

*If ** is finer than **, we also say ** is coarser than **, and we write ** and **.*

**Definition 7** (discrete and indiscrete partitions). *Given a set **, let **.*

*If ** is empty, let **, and if ** is nonempty, let **.*

* is called the discrete partition, and ** is called the indiscrete partition.*

**Proposition 2**. *For any set **, ** is a partial order on **. Further, for all **, ** and **.*

*Proof. *Trivial.

While both notations are sometimes used, it is more standard to draw the symbol in the opposite direction and have when is finer than . We choose to go against that standard because we want to think of partitions in part as the ability to distinguish between elements, and finer partitions correspond to greater ability to distinguish.[5]

**Definition 8 **(common refinement). *Given a set ** of partitions of a fixed set **, let ** denote the partition ** satisfying ** if and only if ** for all **. Given **, we let **.*

## 2.2. Factorizations

We start with a definition of Cartesian product.

**Definition 9 **(Cartesian product). *Given a set ** of sets, let ** denote the set of all functions ** such that for all **, ** is of the form **, for some **.*

We can now give the definition of a factorization of a set.

**Definition 10** (factorization). *A factorization of a set ** is a set ** of nontrivial partitions of ** such that the function **, given by **, is a bijection.*

*Let ** denote the set of all factorizations of **. The elements of a factorization are called factors.*

In other words, a set of nontrivial partitions is a factorization of if for each way of choosing one part from each factor, there exists a unique element of in the intersection of those parts.

Notice the duality between the definitions of partition and factorization. We replace subsets with partitions, nonempty with nontrivial, and disjoint union with Cartesian product, and we reverse the direction of the function. We can think of a factorization of as a way to view as a product, in the same way that a partition was a way to view as a disjoint union.

A factored set is just a set together with a factorization of that set.

**Definition 11 **(factored set). *A factored set ** is an ordered pair **, such that ** is a factorization of **.*

*If ** is a factored set, we let **, and let **.*

An important fact about factored sets is that the factors are enough to distinguish distinct elements.

**Proposition 3**. *Given a factored set **, and elements **, if ** for all **, then **.*

*Proof. *Let be a finite factored set, and let satisfy for all .

Let be given by , as in the definition of factorization. Then . Since is bijective, this means .

## 2.3. Chimera Functions

The following theorem can be viewed as an alternate characterization of factorization. We will use this alternate characterization to define chimera functions, which will be useful tools for manipulating elements of factored sets.

**Theorem 1**. *Given a set **, a set ** of nontrivial partitions of ** is a factorization of ** if and only if for every function **, there exists a unique ** such that for all **, **.*

*Proof.* First, we let be a factorization of , and let be any function. We want to show that there exists a unique such that for all , . Let be given by , as in the definition of factorization. Note that is bijective, and thus has an inverse.

Let . Observe that this is well-defined, because is in fact in . We will show that for all , and the uniqueness of this will then follow directly from Proposition 3.

We have by the definition of . However, we also have by the definition of . Thus, and are the same function, so for all , so for all .

Conversely, let be any set, and let be any set of nontrivial partitions of . Assume that for all , there exists a unique satisfying for . Again, let be given by , as in the definition of factorization. We want to show that is invertible.

First, we show that is injective. Take an arbitrary , and let be the constant function satisfying for all . Given another , if , then , so for all , so for all . Since there is a unique satisfying for all , this means . Thus is injective.

To see that is surjective, consider some arbitrary . We want to show that there exists an with .

For all , let be given by , which is well-defined since . Note that is a nonempty subset of , so there exists a function with for all . Fix any such , and let satisfy for all .

We thus have that for all , , so . Thus is surjective.

Since is bijective, we have that is a factorization of .

This also gives us that factors are disjoint from each other.

**Corollary 1**. *Given a factored set ** and distinct factors **, **.*

*Proof. *Assume by way of contradiction that . Since is nontrivial, there must be some other with . Let be any function such that and . Then there can be no such that and , since then would be in both and . This contradicts Theorem 1.

We are now ready to define the chimera function of a factored set.

**Definition 12 **(chimera function). *Given a factored set **, the chimera function (of **) is the function ** defined by ** for all ** and **.*

The name "chimera function" comes from the fact that can be viewed as building an element of by fusing together the properties of various different elements. Since we will often apply the chimera function to functions that only take on two values, we will give notation for this special case.

**Definition 13**. *Given a factored set **, and a subset **, let ** be given by **, where ** is given by ** if **, and ** otherwise.*

*For **, we will write ** for **.*

The following is a list of properties of , which will be useful in later proofs. All of these properties follow directly from the definition of .

**Proposition 4**. *Fix **, a factored set, **, and **.*

*for all**.**for all**.*- .
- .
- .
- .
- .
- .
- .
- .
- .

*Proof. *Trivial.

## 2.4. Trivial Factorizations

We now define a notion of a trivial factorization of a set, and show that every set has a unique trivial factorization.

**Definition 14 **(trivial factorization). *A factorization ** of a set ** is called trivial if **. A factored set ** is called trivial if ** is trivial.*

**Proposition 5**. *For every set **, there exists a unique trivial factorization ** of **. If **, this trivial factorization is given by **, and if **, it is given by **.*

*Proof. *We start with the case where . The only partition of is , so we only need to consider the sets of partitions and as potential factorizations. is vacuously a factorization of by Theorem 1, since there are no functions from to . is not a factorization by Theorem 1, since there is a function from to , but there is no element of . Thus, when , is the unique trivial factorization of .

Next, consider the case where . First, observe that the unique vacuously satisfies for all and , since there is no . Thus, by Theorem 1, is a factorization of . Further, is the only factorization of , since there are no nontrivial partitions of . Thus, when , is the unique trivial factorization of .

Next, we consider the case where . Observe that is a nontrivial partition of . Let . We want to show that is a factorization of . By Theorem 1, it suffices to show that for all , there exists a unique with . We can take , which clearly satisfies . This is unique, since if , then , so . Thus is a factorization of .

On the other hand, if , is not a factorization of , since if it were, Proposition 3 would imply that all elements of are equal. Further, for any partition of , with , there must exist , with , but . Thus cannot be a factorization of by Proposition 3. Thus when , is the unique trivial factorization of .

## 2.5. Finite Factored Sets

This sequence will primarily be about finite factored sets.

**Definition 15**. *If ** is a factored set, the size of **, written **, is the cardinality of **. The dimension of **, written **, is the cardinality of **. ** is called finite if its size is finite, and finite-dimensional if its dimension is finite.*

We suspect that the theory of infinite factored sets is both interesting and important. However, it is outside of the scope of this sequence, which will require finiteness for many of its key results.

Some of the definitions and results in this sequence will be given for finite factored sets, in spite of the fact that they could easily be extended to finite-dimensional or arbitrary factored sets. This is because they can often be extended in more than one way, and determining which extension is most natural requires further developing the theory of arbitrary factored sets.

**Proposition 6**. *Every finite factored set is also finite-dimensional.*

*Proof. *If is a factored set, is a set of sets of subsets of . Thus, .

This bound is horrible and will be improved in Proposition 9. First, however, we will take a look at the number of factorizations of a fixed finite set.

**Proposition 7**. *Let ** be a finite factored set. Then **.*

*Proof.* Trivial.

**Proposition 8**. *If ** is equal to **, **, or a prime, the trivial factorization of ** is the only factorization of **.*

*Proof*. If or , then , so can have cardinality at most 1.

If , a prime, then by Proposition 7, must divide for all . Since factorizations cannot contain trivial partitions, this means for all . However, is the only element of of cardinality , so .

On the other hand, in the case where is finite and composite, the number of factorizations of grows very quickly, as seen in Table 1. Table 1 shows the number of factorizations of a set with cardinality up to 25:

0 | 1 | 13 | 1 | |

1 | 1 | 14 | 8648641 | |

2 | 1 | 15 | 1816214401 | |

3 | 1 | 16 | 181880899201 | |

4 | 4 | 17 | 1 | |

5 | 1 | 18 | 45951781075201 | |

6 | 61 | 19 | 1 | |

7 | 1 | 20 | 3379365788198401 | |

8 | 1681 | 21 | 1689515283456001 | |

9 | 5041 | 22 | 14079294028801 | |

10 | 15121 | 23 | 1 | |

11 | 1 | 24 | 4454857103544668620801 | |

12 | 13638241 | 25 | 538583682060103680001 |

Given the naturalness of the notion of factorization, we were surprised to discover that this sequence did not exist on the On-Line Encyclopedia of Integer Sequences (OEIS). We added the sequence, A338681, on April 30, 2021.

To give one concrete example, the four factorizations of the set are:

- ,
- ,
- , and
- .

**Proposition 9**. *Let ** be a finite factored set. *

*If**, then**.**If**, then**.**If**is prime, then**.**If**is a product of**primes, then**.*

*Proof. *The first three parts follow directly from Proposition 5 and Proposition 8. For the fourth part, let , and let be a product of primes.

By Proposition 7, . Consider an arbitrary . Since is a nontrivial partition of a finite set , is finite and . If were , then would be . Thus is a natural number greater than or equal to . cannot be empty, since . If were greater than , then we would be able to express as a product of more than natural numbers greater than or equal to , which is clearly not possible since is a product of primes. Thus .

In the next post, we will introduce the notions of the history of a partition, orthogonality between partitions, and time.

**Acknowledgments: **My thanks to Alex Appel, Ramana Kumar, Xiaoyu He, Tsvi Benson-Tilsen, Andrew Critch, Sam Eisenstat, Rob Bensinger, and Claire Wang for discussion and feedback on this sequence.

# Footnotes

[1] Although I say "factorize" here, note that this will not be the kind of factorization that shows up in finite factored sets, because (as we will see) disjoint factors must be independent in a finite factored set. I appeal to the same concept in both contexts because factorization is just a very general and useful concept, rather than to indicate a direct connection.

[2] At least, it lacks such causal inference tools unless we assume access to interventional data.

[3] Note that this definition and Definition 9 could have been made more general by taking to be a multiset.

[4] denotes the power set of .

[5] In our view, "" is also a more natural way to visually represent a mapping between a three-part partition that is finer than a two-part partition .

The part about Chimera functions was surprising, and I look forward to seeing where that will go, and to more of this in general.

In section 2.1 , Proposition 2 should presumably say that ≥S is a partial order on Part(S) rather than on S .

Fixed, Thanks.