AI ALIGNMENT FORUM
AF

Henry Sleight
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Tips for Empirical Alignment Research
Henry Sleight2y31

First off: as one of Ethan's current Astra Fellows (and having worked with him since ~last October) I especially think his collaborators in MATS and Astra historically underweight how valuable overcommunicating with Ethan is, and routinely underbook meetings to ask for his support.

Second, I think this post is so dense with useful advice, so I made anki flashcards of Ethan's post using GPT-4 (generated via ankibrain [https://ankiweb.net/shared/info/1915225457] , small manual edits.)

You can find them here: https://drive.google.com/file/d/1G4i7iZbILwAiQ7FtasSoLx5g7JIOWgeD/view?usp=sharing

Reply
No wikitag contributions to display.
31Best-of-N Jailbreaking
9mo
1
43Reward hacking behavior can generalize across tasks
1y
1
17Inducing Unprompted Misalignment in LLMs
1y
2
14How I select alignment research projects
1y
4