x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Simon Lermen — AI Alignment Forum
Simon Lermen
Substack: https://substack.com/@simonlermen
Twitter: @SimonLermenAI
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
2
Simon Lermen's Shortform
2mo
0
52
unRLHF - Efficiently undoing LLM safeguards
2y
2
51
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
2y
0
14
Robustness of Model-Graded Evaluations and Automated Interpretability
2y
2
Comments