Avoiding jailbreaks by discouraging their representation in activation space — AI Alignment Forum