This post is sort of an intermediate between parts 1 and 2 of the sequence. It makes three points that I think people tend to get wrong.
It's possible to lose sight of why Factored Cognition is employed in the first place. In particular, it's not as a way to boost capability: while the amplification step in IDA is implemented via Factored Cognition, the purpose of this is alignment: IDA would be more straight-forward (and more similar to AlphaZero) if each Ak were trained to approximate [Ak−1 thinking for longer] rather than [Haccess⟶Ak−1].
Recall from post #-2 that I've framed the entire problem of AI risk as 'once systems become too capable, it gets difficult for a human to provide a training signal or training data'. There are many approaches to get aligned systems anyway: there's ambitious value learning, there's trying to develop an new framework, there's impact measures, there's norm following, there's avoiding agents altogether, et cetera et cetera. But another option is to reduce the difficult of providing the training signal to human judgment (and do so for each instance separately), which is what IDA and Debate are trying to do. This is what Factored Cognition is used for. In a previous version of this post, I've dubbed the approach 'narrow alignment proving' since the system repeatedly 'proves' that it gives true answers. Case in point, the proper way to view Factored Cognition is as a tool to achieve outer alignment.
In stock IDA, this corresponds to the fact that we get outer alignment of each Ak by induction, precisely because each amplification step is implemented via Factored Cognition. In Debate, this corresponds to the debate game itself. Take out everything after the first statement by the first agent, and you get precisely the classical Oracle AI setup.
My impression has been that some people view IDA and Debate as quite differently. They can get a handle on IDA, perhaps because of its similarity to existing Machine Learning techniques, but having two agents debate each other sounds exotic.
However, I think the proper way to view Debate, especially in the limit, is simply as stock IDA plus a decomposition Oracle. If you consider an HCH tree solving a problem, say deriving a math proof, you get a transcript that corresponds to a Debate Tree, the formal object I've introduced in post #2. In particular, it's going to be a much larger, less elegant Debate Tree. But add a decomposition Oracle (that all nodes in the HCH tree can use), and the two things become almost identical.
In Debate, only one path of the tree really occurs when the scheme is executed; the rest remains implicit. But this is also analogous to IDA, where only the upper node really occurs. Both schemes have the human look at a tiny part of the Cognition Space (although they do differ on what part it is).
I think it is almost fair to say that Ideal Debate=HCH+Decomposition Oracle. In the concrete schemes, the = becomes an ≈ since implementation challenges are different.
Factored Cognition can seem hard to get a grasp on if it is viewed as an infinite process of decomposing a problem further and further, especially if we start to consider meta questions, where the task of decomposing a problem is itself decomposed.
However, in both schemes, Factored Cognition includes a step that is by definition non-decomposable. In Ideal Debate, this is step is judging the final statement. In HCH, it is solving a problem given access to the subtrees. This step is also entirely internal to the human. The human has to
Note that this is in line with the formalism from posts #1 and #2: statements have difficulties, and at some point, the judge needs to verify one. This works iff the difficulty isn't too high. This statement cannot be decomposed due to the way it was chosen (if it could, the first agent would have done so, and the judge wouldn't have to verify it).
I think a good way to think about the question 'is HCH capable of solving hard problems' is to take the task 'solve a problem given access to an oracle that can solve slightly easier problems', and consider it a function f of the difficulty x of the input problem. Then ask, how fast does f grow as a function of x?
The same framing also works for Debate, where f is the difficulty of judging sfinal, and x is the complexity of the initial answer aq to the input question.
It's also worth pointing out that the second step is the only part where mistakes can come in. In both idealized schemes, correctness of Factored Cognition comes down to a human verifying whether or not an implication of the form (s1,...,sn)⟹s, what we've called an explanation, is valid. ↩︎
The notation f∈Θ(g) is defined as f∈O(g)∧g∈O(f) and means that f and g grow asymptotically equally fast. ↩︎