x

AI ALIGNMENT FORUM

AF

Cam — AI Alignment Forum

Cam Tice

Top postsTop post

Cam Tice

Message

I help run www.geodesicresearch.org

527

Ω

74

4

41

2y

Cam Tice

I help run www.geodesicresearch.org

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

TL;DR LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with synthetic data about good AIs helps them become more aligned. These alignment priors persist through post-training, providing alignment-in-depth. We recommend labs pretrain for alignment, just as they do for capabilities. Website: alignmentpretraining.ai Us: geodesicresearch.org...

Dec 21, 2025•200

Generalisation Hacking: a first look at adversarial generalisation failures in deliberative alignment

Background Deliberative alignment is a powerful post-training alignment technique that involves generating and training on re-contextualised supervised fine-tuning (SFT) datasets generated with a set of principles in context. The process takes three steps: 1. With the set of principles[1] (henceforth: the constitution) in a base model’s[2] context. The base model...

Nov 17, 2025•48