AI ALIGNMENT FORUM
AF

Simon Lermen
Ω63000
Message
Dialogue
Subscribe

Twitter: @SimonLermenAI

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
52unRLHF - Efficiently undoing LLM safeguards
2y
2
51LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
2y
0
14Robustness of Model-Graded Evaluations and Automated Interpretability
2y
2