Posts

Sorted by New

10Tensor Trust: An online game to uncover prompt injection vulnerabilities

8mo

0

Wiki Contributions

Comments

Tips for Empirical Alignment Research

qxcv2mo10

For highly empirical research, it’s critical to get quick feedback and iterate on ideas rapidly. Jacob Steinhardt has a great blog post describing that a really good strategy for doing research is to “reduce uncertainty at the fastest possible rate”

Michael Bernstein's slides on velocity are a great resource for learning this mindset this as well. I particularly like his metaphor of the "swamp". This is the place you get stuck when you really want technique X to work for the project to progress, but none of the ways that you've tried applying it have succeeded. The solution is to have high velocity: that is, to test out as many ideas as possible per unit time until you get out the swamp. Other highlights of the slide deck include the focus on answering questions rather than doing engineering, and the related core-periphery distinction between things that are strictly needed to answer a question & those that can be ignored/mocked up/replaced for testing (which echoes the ideas in the "workflow" section of this post).

(Although they're similar, I'd argue that Michael's approach is easier to apply to empirical alignment research than Jacob's "stochastic decision process" approach. That's because falsifying abstract research ideas in empirical deep learning is hard (impossible?), and you don't get much generalizable knowledge from failing to get one idea to work. The real aim is to find one deep insight that does generalize—hence the focus on trying many distinct approaches.)

Reply