Here is how I'm currently thinking about this framework and especially inference, in case it's helpful for other folks who have similar priors to mine (or in case something is still wrong).
A description of traditional causal models:
In the factored sets framework:
I think the definition of history is the most natural way to recover something like causal structure in these models.
I'm not sure how much it's about causality. Imagine there's a bunch of envelopes with numbers inside, and one of the following happens:
Alice peeks at three envelopes. Bob peeks at ten, which include Alice's three.
Alice peeks at three envelopes and tells the results to Bob, who then peeks at seven more.
Bob peeks at ten envelopes, then tells Alice the contents of three of them.
Under the FFS definition, Alice's knowledge in each case is "strictly before" Bob's. So it seems less of a causal relationship and more like "depends on fewer basic facts".
Agree it's not totally right to call this a causal relationship.
That said:
So I do feel like there's a case t...
I think I (at least locally) endorse this view, and I think it is also a good pointer to what seems to me to be the largest crux between the my theory of time and Pearl's theory of time.
Makes sense. I think a bit of my naming and presentation was biased by being so surprised by the not on OEIS fact.
I think I disagree about the bipartite graph thing. I think it only feels more natural when comparing to Pearl. The talk frames everything in comparison to Pearl, but I think if you are not looking at Pearl, I think graphs don’t feel like the right representation here. Comparing to Pearl is obviously super important, and maybe the first introduction should just be about the path from Pearl to FFS, but once we are working within the FFS ontology, graphs feel not useful. One crux might be about how I am excited for directions that are not temporal inference from statistical data.
My guess is that if I were putting a lot of work into a very long introduction for e.g. the structure learning community, I might start the way you are emphasizing, but then eventually convert to throwing all the graphs away.
(The paper draft I have basically only ever mentions Pearl/graphs for motivation at the beginning and in the applications section.)
Planned summary for the Alignment Newsletter. I'd be particularly keen for someone to check that I didn't screw up in the section "Orthogonality in Finite Factored Sets":
...This newsletter is a combined summary + opinion for the [Finite Factored Sets sequence](https://www.alignmentforum.org/s/kxs3eeEti9ouwWFzr) by Scott Garrabrant. I (Rohin) have taken a lot more liberty than I usually do with the interpretation of the results; Scott may or may not agree with these interpretations.
## Motivation
One view on the importance of deep learning is that it allows you to automatically _learn_ the features that are relevant for some task of interest. Instead of having to handcraft features using domain knowledge, we simply point a neural net at an appropriate dataset, and it figures out the right features. Arguably this is the _majority_ of what makes up intelligent cognition; in humans it seems very analogous to [System 1](https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow), which we use for most decisions and actions. We are also able to infer causal relations between the resulting features.
Unfortunately, [existing models](https://en.wikipedia.org/wiki/The_Book_of_Why) of causal inference d
Are there any interesting pure combinatorics problems about finite factored sets that you're interested in?
I've thought a good amount about Finite Factored Sets in the past year or two, but I do sure keep going back to thinking about the world primarily in the form of Pearlian causal influence diagrams, and I am not really sure why.
I do think this one line by Scott at the top gave me at least one pointer towards what was happening:
but I'm trained as a combinatorialist, so I'm giving a combinatorics talk upfront.
In the space of mathematical affinities, combinatorics is among the branches of math I feel most averse to, and I think that explains a good...
I was thinking of some terminology that might make it easier to thinking about factoring and histories and whatnot.
A partition can be thought of as a (multiple-choice) question. Like for a set of words, you could have the partition corresponding to the question "Which letter does the word start with?" and then the partition groups together elements with the same answer.
Then a factoring is set of questions, where the set of answers will uniquely identify an element. The word that comes to mind for me is "signature", where an element's signature is the set o...
Let , where and
[...] The second rule says that is orthogonal to itself
Should that be "is not orthogonal to itself"? I thought the meant non-orthogonal, so would think means that is not orthogonal to itself.
(The transcript accurately reflects what was said in the talk, but I'm asking whether Scott misspoke.)
You said that you thought that this could be done in a categorical way. I attempted something which appears to describe the same thing when applied to the category FinSet , but I'm not sure it's the sort of thing you meant by when you suggested that the combinatorial part could potentially be done in a categorical way instead, and I'm not sure that it is fully categorical.
Let S be an object.
For i from 1 to k, let be an object, (which is not anything isomorphic to the product of itself with itself, or at least is not the terminal object) .
Let&n...
Some general comments:
You mention above that Pearl's ontology 'has blinded us to the obvious next question'. I am very sympathetic to research programmes that try to overcome such blindness, this is the kind or research I have been doing myself recently. The main type of blindness that I have been trying to combat is blindness to complex types of self-referencing and indirect representation that can be present inside online machine learning agents, specifically in my recent work I have added a less blind viewpoint by modifying and...
I was thinking about the difficulty of finite factored sets not understanding the uniform distribution over 4 elements, and it makes me feel like something fundamental needs to be recast. An analogy came to mind about eigenvectors vs. eigenspaces.
What we might like to be true about the unit eigenvectors of a matrix is that they are the unique unit vectors for which the linear transformation preserves direction. But if two eigenvectors have the same eigenvalue, the choice of eigenvectors is not unique--we could choose any pair on that plane. So really, it s...
And if X is independent of X XOR Y, we’re actually going to be able to conclude that X is before Y!
It's interesting to translate that to the language of probabilities. For example, your condition holds for any X,Y (possibly dependent) such that P(X)=P(Y)=1/2, but it doesn't make sense to say that X is before Y in every such pair. For a real world example, take X = "person has above median height" and Y = "person has above median age".
Well, imagine we have three boolean random variables. In "general position" there are no independence relations between them, so we can't say much. Constrain them so two of the variables are independent, that's a bit less "general", and we still can't say much. Constrain some more so the xor of all three variables is always 1, that's even less "general", now we can use your method to figure out that the third variable is downstream of the first two. Constrain some more so that some of the probabilities are 1/2, and the method stops working. What I'd like to understand is the intuition, which real world cases have the particular "general position" where the method works.
Hmm, first I want to point out that the talk here sort of has natural boundaries around inference, but I also want to work in a larger frame that uses FFS for stuff other than inference.
If I focus on the inference question, one of the natural questions that I answer is where I talk about grue/bleen in the talk.
I think for inference, it makes the most sense to think about FFS relative to Pearl. We have this problem with looking at smoking/tar/cancer, which is what if we carved into variables the wrong way. What if instead of tar/cancer, we had a variable for "How much bad stuff is in your body?" and "What is the ratio of tar to cancer?" The point is that our choice of variables both matters for the Pearlian framework, and is not empirically observable. I am trying to do all the good stuff in Pearl without the dependence on the variables
Indeed, I think the largest crux between FFS and Pearl is something about variable realism. To FFS, there is no realism to a variable beyond its information content, so it doesn't make sense to have two variables X, X' with the same information, but different temporal properties. Pearl's ontology, on the other hand, has these graphs with variabl...
"Well, what if I take the variables that I'm given in a Pearlian problem and I just forget that structure? I can just take the product of all of these variables that I'm given, and consider the space of all partitions on that product of variables that I'm given; and each one of those partitions will be its own variable.
How can a partition be a variable? Should it be "part" instead?
Definition paraphrasing attempt / question:
Can we say "a factorization B of a set S is a set of nontrivial partitions of S such that " (cardinality not taken)? I.e., union(intersect(t in tuple) for tuple in cartesian_product(b in B)) = S. I.e., can we drop the intermediate requirement that each intersection has a unique single element, and only require the union of the intersections is equal to S?
Curated. This is a fascinating framework that (to the best of my understanding) makes substantive improvements on the Pearlian paradigm. It's also really exciting that you found a new simple sequence.
Re: the writeup, it's explained very clearly, the Q&A interspersed is a very nice touch. I like that the talk factorizes.
I really appreciate the research exploration you do around ideas of agency and am very happy to celebrate the writeups like this when you produce them.
I'm confused by the definition of conditional history, because it doesn't seem to be a generalisation of history. I would expect , but both of the conditions in the definition of are vacuously true if . This is independent of what is, so . Am I missing something?
I don't understand the motivation behind the fundamental theorem, and I'm wondering if you could say a bit more about it. In particular, it suggests that if I want to propose a family of probability distributions that "represent" observations somehow ("represents" maybe means in the sense of Bayesian predictions or in the sense of frequentist limits), I want to also consider this family to arise from some underlying mutually independent family along with some function. I'm not sure why I would want to propose an underlying family and a function at all, and...
Hmm, I am not sure what to say about the fundamental theorem, because I am not really understanding the confusion. Is there something less motivating about the fundamental theorem, than the analogous theorem about d-seperation being equivalent to conditional independence in all distributions comparable with the DAG?
Maybe this helps? (probably not): I am mostly imagining interacting with only a single distributions in the class, and the claim about independence in all probability distributions comparable with the structure can be replaced with instead independence in a general position probability distribution comparable with the structure.
I am not thinking of it as related to a maximum entropy argument.
The point about SEMs having more structure that I am ignoring is correct. I think that the largest philosophical difference between my framework and Pearlian one is that I am denying realism of anything beyond the "apparently identical." Another way to think about it is that I am denying realism about there being anything about the variables beyond their information. All of my definitions are only based on the information content of the variables, and so, for example, if you have two...
I'm using some of the terminology I suggested here.
A factoring is a set of questions such that each signature of possible answers identifies a unique element. In 20 questions, you can tailor the questions depending on the answers to previous questions, and ultimately each element will have a bitstring signature depending on the history of yesses and nos. I guess you can define the question to include xors with previous questions, so that it effectively changes depending on the answers to others. But it's sometimes useful that the bitstrings are allowed to ...