Have you ever binge-watched a TV series? Binge-watching puts you in a very peculiar mental state. Assuming you don't reflectively endorse your binge-watching behavior, you'd probably feel pretty bad if you were to reflect on your situation. You might think: "Man, I am wasting my time. I still need to...
TL;DR: S-expressions are a minimal structure language for early-stage conceptual engineering (aka deconfusion). They are especially useful in alignment, where the hardest part is often figuring out what the problem even is. S-expressions do not enforce any semantics. This lets you write down structure before you know what the structure...
TL;DR: Most alignment work focuses either on theoretical deconfusion or interpreting opaque models. This post argues for a third path: constraining general intelligence through structural control of cognition. Instead of aligning outcomes, we aim to bound the reasoning process—by identifying formal constraints on how plans are generated, world models are...
Consider reading this instead. Here is some obvious advice. I think a common failure mode when working on AI alignment[1] is to not focus on the hard parts of the problem first. This is a problem when generating a research agenda, as well as when working on any specific research...
Status: Some rough thoughts and intuitions. 🪧 indicates signposting TL;DR If we make our optimization procedures transparent, we might be able to analyze them in toy environments to build understanding that generalizes to the real world and to more powerful, scaled-up versions of the systems. 🪧 Let's remind ourselves why...
There are people not motivated to solve AI alignment, who do work related to AI alignment. E.g. people work on adversarial robustness, understanding how to do science mechanically, or on advancing other paradigms that are more interpretable than modern ML. These people might be interested in the science, or work...
Let's suppose that an Em researcher runs 1000 times faster than the equivalent human brain. To the Em researcher who runs faster, it will seem like the experiments take significantly longer to run. So there is more waiting around. I expect that a collection of Ems would still be able...