x

AI ALIGNMENT FORUM

AF

Emil Ryd — AI Alignment Forum

Emil Ryd

Emil Ryd

Message

100

Ω

17

3

4

1y

Emil Ryd

100

Ω

17

1y

Supervised fine-tuning as a method for training-based AI control

Executive summary This post is a research update on our ongoing project studying training-based AI control: using training to mitigate risk from misaligned AI systems in diffuse threat models like research sabotage. This post presents our results on supervised fine-tuning (SFT), the simplest training-based approach. We evaluate our training measures...

Nov 13, 2025•40