x

AI ALIGNMENT FORUM
AF

John Hughes — AI Alignment Forum

John Hughes

Ω141210

Former MATS scholar working on scalable oversight and adversarial robustness.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No wikitag contributions to display.

Tips and Code for Empirical Research Workflows

John Hughes10mo20

Thanks Neel! I'm glad you found it helpful. If you or your scholars recommend any other tools not mentioned in the post, I'd be interested to hear more.

74Why Do Some Language Models Fake Alignment While Others Don't?

5mo

2

70Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

8mo

6

40Tips and Code for Empirical Research Workflows

10mo

3

31Best-of-N Jailbreaking

1y

1

47Debating with More Persuasive LLMs Leads to More Truthful Answers

2y

7