AI ALIGNMENT FORUM
AF

324
Mikita Balesni
Ω1000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Frontier Models are Capable of In-context Scheming
Mikita Balesni10mo914

I think one practical difference is whether filtering pre-training data to exclude cases of scheming is a useful intervention.

Reply