> Let your every day be full of joy, love the child that holds your hand, let your wife delight in your embrace, for these alone are the concerns of humanity.[1] > > — Epic of Gilgamesh - Tablet X Say we want to train a scientist AI to help...
This post was written as part of the work done at Conjecture. You can try input swap graphs in a collab notebook or explore the library to replicate the results. Thanks to Beren Millidge and Eric Winsor for useful discussions throughout this project. Thanks to Beren for feedback on a...
To learn more about this work, check out the paper. We assume general familiarity with transformer circuits. Intro: There isn’t much interpretability work that explains end-to-end how a model is able to do some task (except for toy models). In this work, we make progress towards this goal by understanding...