AI ALIGNMENT FORUM
AF

33
Oscar Balcells Obeso
000
Message
Dialogue
Subscribe

Math undergrad at ETH Zurich.

More info: oscarbalcells.com

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
77Refusal in LLMs is mediated by a single direction
1y
44
34Refusal mechanisms: initial experiments with Llama-2-7b-chat
2y
1