At PIBBSS, we’ve been thinking about how renormalization can be developed into a rich framework for AI interpretability. This document serves as a roadmap for this research agenda – which we are calling an Opportunity Space[1] for the AI safety community. In what follows, we explore the technical and philosophical...
Introduction: Why QFT? In a previous post, Lauren offered a take on why a physics way of thinking is so successful at understanding AI systems. In this post, we look in more detail at the potential of Quantum field theory (QFT) to be expanded into a more comprehensive framework for...
Context: This is part of a series of posts I am writing with Dmitry Vaintrob, as we aim to unpack some potential value from Quantum Field Theory (QFT). Consider this post as framing why physics and its frameworks can be good for building a science of AI. Introduction In Position:...
Epistemic Status: This post is an attempt to condense some ideas I've been thinking about for quite some time. I took some care grounding the main body of the text, but some parts (particularly the appendix) are pretty off the cuff, and should be treated as such. The magnitude and...
Background This project was inspired by Anthropic’s post on attention head superposition, which constructed a toy model trained to learn a circuit to identify skip-trigrams that are OV-incoherent (attending from multiple destination tokens to a single source token) as a way to ensure that superposition would occur. Since the OV...