Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
by Cam, Puria, Kyle O’Brien, David Africa, Samuel Ratnam, and andyk
TL;DR LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned. These alignment priors persist through post-training, providing alignment-in-depth. We recommend labs pretrain for alignment, just as they do for capabilities. Website: alignmentpretraining.ai Us: geodesicresearch.org...
Dec 21, 2025201