A company that made machine learning software for drug discovery, on hearing about the security concerns for these sorts of models, asked: "huh, I wonder how effective it would be?" and within 6 hours discovered not only one of the most potent known chemical warfare agents, but also a large number of candidates that the model thought was more deadly.
This is basically a real-world example of the "it just works to flip the sign of the utility function and turn a 'friend' into an 'enemy'"; this was slightly more complicated as they had two targets that they jointly optimized for the drug discovery process (toxicity and bioactivity), and only the toxicity target is flipped. [This makes sense--you'd want your chemical warfare agents to not be bioactive.] It also required a little bit of domain knowledge--they had to specify which sort of bioactivity to look for, and picked one that would point towards this specific agent.
Worth remembering that flips of the reward function do happen: https://openai.com/blog/fine-tuning-gpt-2/#bugscanoptimizeforbadbehavior ("Was this a loss to minimize or a reward to maximize...")