x

AI ALIGNMENT FORUM

AF

racinkc1 — AI Alignment Forum

racinkc1

racinkc1

Message

-15

2

1

1y

racinkc1

-15

1y

Expanding HarmBench: Investigating Gaps & Extending Adversarial LLM Testing

Dear Alignment Forum Members, We recently reached out to Oliver from Safe.ai regarding their work on HarmBench, an adversarial evaluation benchmark for LLMs. He confirmed that while they are not planning a follow-up, we have their blessing to expand upon the experiment. Given the rapid evolution of language models and...

Mar 3, 2025•1