TL;DR Large language models have demonstrated an emergent ability to write code, but this ability requires an internal representation of program semantics that is little understood. Recent interpretability work has demonstrated that it is possible to extract internal representations of natural language concepts, raising the possibility that similar techniques could be used to extract program semantics concepts. In this work, we study how large language models represent the nullability of program values. We measure how well models of various sizes at various training checkpoints complete programs that use nullable values, and then extract an internal representation of nullability.
The last five years have shown us that large language models, like ChatGPT, Claude, and DeepSeek, can effectively write programs in many domains. This is an impressive capability, given that...