Top postsTop post
Context:I (Daniel C) have been working with Aram Ebtekar on various directions in his work on algorithmic thermodynamics and the causal arrow of time. This post explores some implications of algorithmic thermodynamics on the concept of optimization. All mistakes are my (Daniel's) own. A typical picture of optimization is when...
This is a link post for two papers that came out today: * Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time (Tan et al.) * Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment (Wichers et al.) These papers both study the following...