Scalable oversight is the problem of providing reliable supervision of outputs from AIs, even as they become smarter than humans. Often groups of weaker AIs supervise a stronger AI, or AIs are set in a zero-sum debate with each other.
Scalable oversight techniques aim to make it easier for humans to evaluate the outputs of AIs, or to provide a reliable training signal that can not be easily reward-hacked.
Variants include AI Safety via debate, iterated distillation and amplification, and imitative generalization.