AI ALIGNMENT FORUM
All of technicalities's Comments + Replies
Shallow review of live agendas in alignment & safety
I like this. It's like a structural version of control evaluations. Will think where to put it in
Expanding on this -- this whole area is probably best known as "AI Control", and I'd lump it under "Control the thing" as its own category. I'd also move Control Evals to this category as well, though someone at RR would know better than I.