x

AI ALIGNMENT FORUM

AF

Cipolla — AI Alignment Forum

Cipolla

Top postsTop post

Cipolla

Message

13

Ω

6

7

23

2y

Cipolla

13

Ω

6

2y

Looking for backdoors in Jane Street LLMs

I am going to talk about my experience in the Jane Street LLM backdoor challenge. I am sharing partial results. I managed to crack some of the models using white-box methods, after the activation/prompting approach didn't pan out. Happy to discuss better or more promising approaches. Introduction A few months...