Dmitry Vaintrob — AI Alignment Forum

Steelmanning heuristic arguments

Introduction This is a nuanced “I was wrong” post. Something I really like about AI safety and EA/rationalist circles is the ease and positivity in people’s approach to being criticised.[1] For all the blowups and stories of representative people in the communities not living up to the stated values, my...

Apr 13, 202577

Renormalization Redux: QFT Techniques for AI Interpretability

by Lauren Greenspan and Dmitry Vaintrob

Introduction: Why QFT? In a previous post, Lauren offered a take on why a physics way of thinking is so successful at understanding AI systems. In this post, we look in more detail at the potential of Quantum field theory (QFT) to be expanded into a more comprehensive framework for...

Jan 18, 202547

The subset parity learning problem: much more than you wanted to know

Imagine that you’re looking for buried treasure on a large desert island, worth a billion dollars. You don’t have a map, but a mysterious hermit offers you a box with a button to help find the treasure. Each time you press the button, it will tell you either “warmer” or...

Jan 3, 2025106

Grammars, subgrammars, and combinatorics of generalization in transformers

Introduction This is the first installment of my January writing project. We will look at generative neural networks from the framework of (probabilistic) "formal grammars", specifically focusing on building a complex grammar out of simple “rule grammars”. This turns out to lead to a nice, and relatively non-technical way of...

Jan 2, 202536

Toward A Mathematical Framework for Computation in Superposition

Author order randomized. Authors contributed roughly equally — see attribution section for details. Update as of July 2024: we have collaborated with @LawrenceC to expand section 1 of this post into an arXiv paper, which culminates in a formal proof that computation in superposition can be leveraged to emulate sparse...

Jan 18, 2024214

Investigating the learning coefficient of modular addition: hackathon project

by Nina Panickssery and Dmitry Vaintrob

As our project at the Melbourne hackathon on Singular Learning Theory and alignment (Oct. 7-8), we did some experiments to estimate the learning coefficient of the single-layer modular addition task at a basin, an invariant that measures the information complexity (read: program length) of a fully trained neural net. We...

Oct 17, 202397