AI ALIGNMENT FORUM
AF

Henry Sleight
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Tips for Empirical Alignment Research
Henry Sleight1y31

First off: as one of Ethan's current Astra Fellows (and having worked with him since ~last October) I especially think his collaborators in MATS and Astra historically underweight how valuable overcommunicating with Ethan is, and routinely underbook meetings to ask for his support.

Second, I think this post is so dense with useful advice, so I made anki flashcards of Ethan's post using GPT-4 (generated via ankibrain [https://ankiweb.net/shared/info/1915225457] , small manual edits.)

You can find them here: https://drive.google.com/file/d/1G4i7iZbILwAiQ7FtasSoLx5g7JIOWgeD/view?usp=sharing

Reply
31Best-of-N Jailbreaking
7mo
1
43Reward hacking behavior can generalize across tasks
1y
1
17Inducing Unprompted Misalignment in LLMs
1y
2
14How I select alignment research projects
1y
4