This post is a follow-up to my “Alignment proposals and complexity classes” post. Thanks to Sam Eisenstat for helping with part of the proof here.
Previously, I proved that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling access PSPACE while AI safety via market making accesses EXP. At the time, I wasn't sure whether my market making proof would generalize to the others, so I just published it with the PSPACE proofs instead. However, I have since become convinced that the proof does generalize—and that it generalizes for all of the proposals I mentioned—such that imitative amplification with weak HCH, approval-based amplification, and recursive reward modeling all actually access EXP. This post attempts to prove that.
P: Imitation learning (trivial)
PSPACE: AI safety via debate (proof)
EXP: AI safety via market making (proof), Imitative amplification with weak HCH (proof below), Approval-based amplification (proof below), Recursive reward modeling (proof below)
NEXP: Debate with cross-examination (proof)
R: Imitative amplification with strong HCH (proof), AI safety via market making with pointers (proof)
The proof here is similar in structure to my previous proof that weak HCH accesses PSPACE, so I'll only explain where this proof differs from that one.
First, since l∈EXP, we know that for any x∈X, Tl(x) halts in O(2poly(n)) steps where n=|x|. Thus, we can construct a function fl(n)=c1+c2ec3nc4 such that for all x∈X, Tl(x) halts in less than or equal to fl(x) steps by picking c3, c4 large enough that they dominate all other terms in the polynomial for all n∈N. Note that fl is then computable in time polynomial in n.
Second, let H's new strategy be as follows:
Note that this strategy precisely replicates the strategy used in my proof that market making accesses EXP for inputs p:i and p:i:j. Thus, I'll just defer to that proof for why the strategy works and is polynomial time on those inputs. Note that this is where l∈EXP becomes essential, as it guarantees that i and j can be represented in polynomial space.
Then, the only difference between this strategy and the market making one is for the base p input. On p, given that M(p:i) works, M(p:f(|x|)) will always return a state after Tl has halted, which will always be an accept state if Tl accepts and a reject state if it rejects. Furthermore, since |M(p:f(|x|)|∈O(1) and f is computable in polynomial time, the strategy for p is polynomial, as desired.
Since this procedure works for all l∈EXP, we get that amplification with weak HCH accesses EXP, as desired.
The proof here is almost precisely the same as my previous proof that approval-based amplification accesses PSPACE with the only modification being that we need to verify that |M(q)|+|H(q,M)| are still polynomial in size for the new imitative amplification strategy given above. However, the fact that i and j are bounded above by tp means that they are representable in polynomial space, as desired.
Again, the proof here is almost precisely the same as my previous proof that recursive reward modeling accesses PSPACE with the only modification being the same as with the above proof that approval-based amplification accesses EXP.