Expanding HarmBench: Investigating Gaps & Extending Adversarial LLM Testing
Dear Alignment Forum Members, We recently reached out to Oliver from Safe.ai regarding their work on HarmBench, an adversarial evaluation benchmark for LLMs. He confirmed that while they are not planning a follow-up, we have their blessing to expand upon the experiment. Given the rapid evolution of language models and...
Mar 3, 20251