I like this. It's like a structural version of control evaluations. Will think where to put it in

3Lawrence Chan3mo
Expanding on this -- this whole area is probably best known as "AI Control", and I'd lump it under "Control the thing" as its own category. I'd also move Control Evals to this category as well, though someone at RR would know better than I.