AI ALIGNMENT FORUM
AF

605
John Hughes
Ω141210
Message
Dialogue
Subscribe

Former MATS scholar working on scalable oversight and adversarial robustness.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Tips and Code for Empirical Research Workflows
John Hughes9mo20

Thanks Neel! I'm glad you found it helpful. If you or your scholars recommend any other tools not mentioned in the post, I'd be interested to hear more.

Reply
74Why Do Some Language Models Fake Alignment While Others Don't?
3mo
2
70Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
6mo
6
40Tips and Code for Empirical Research Workflows
9mo
2
31Best-of-N Jailbreaking
10mo
1
47Debating with More Persuasive LLMs Leads to More Truthful Answers
2y
7