Censoring out-of-domain representations — AI Alignment Forum