This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Adversarial Training
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Random Tag
Contributors
Posts tagged
Adversarial Training
Most Relevant
1
57
Takeaways from our robust injury classifier project [Redwood Research]
dmz
2y
7
1
52
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Stephen Casper
4mo
5
1
17
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck Shlegeris
2y
0
1
14
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Aidan O'Gara
4mo
15
2
10
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
2y
0
1
6
Some thoughts on why adversarial training might be useful
Beth Barnes
2y
4
1
21
Latent Adversarial Training
Adam Jermyn
2y
3
1
15
EIS IX: Interpretability and Adversaries
Stephen Casper
1y
5
1
14
Oversight Leagues: The Training Game as a Feature
Paul Bricman
2y
0
1
6
EIS XI: Moving Forward
Stephen Casper
1y
2
1
5
EIS XII: Summary
Stephen Casper
1y
0
0
0
Continuous Adversarial Quality Assurance: Extending RLHF and Constitutional AI
Benaya Koren
9mo
0