This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Simon Lermen
Posts
Sorted by New
51
unRLHF - Efficiently undoing LLM safeguards
2mo
2
49
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
2mo
0
11
Robustness of Model-Graded Evaluations and Automated Interpretability
4mo
2
Wiki Contributions
Comments