Don't design agents which exploit adversarial inputs — AI Alignment Forum