x

AI ALIGNMENT FORUM

AF

tjbai — AI Alignment Forum

tjbai

tjbai

Message

46

Ω

15

4

2y

tjbai

46

Ω

15

2y

An issue with training schemers with supervised fine-tuning

tjbai2y110

It's not clear to me that you do get stronger guarantees because the setting and method is so similar to that of classical imitation learning. In both cases, we seek to learn a policy that is aligned with the expert (human). Supervised fine-tuning (behavioral cloning) is problematic because of distribution shift, i.e. the learned policy accumulates error (at a quadratic rate!) and visits states it did not see in training.

You say this failure mode is dangerous because of scheming AI and I say it's dangerous because the policy is OOD, but it appears you agre... (read more)

An issue with training schemers with supervised fine-tuning

tjbai2y152

How does this differ from DAgger (https://arxiv.org/abs/1011.0686)?