Quentin FEUILLADE--MONTIXI

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

by Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush, and scasper

Paper coauthors: Rusheb Shah, Quentin Feuillade--Montixi, Soroush J. Pour, Arush Tagade, Stephen Casper, Javier Rando. Motivation Our research team was motivated to show that state-of-the-art (SOTA) LLMs like GPT-4 and Claude 2 are not robust to misuse risk and can't be fully aligned to the desires of their creators, posing...

Nov 7, 202338

Quentin Feuillade--Montixi

Quentin Feuillade--Montixi

Studying The Alien Mind

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Three Properties for Alignment (and Why We're Not Training Them)

The Topology of LLM Behavior

Quentin Feuillade--Montixi

Studying The Alien Mind

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Studying The Alien Mind

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Three Properties for Alignment (and Why We're Not Training Them)

The Topology of LLM Behavior