AI ALIGNMENT FORUM
AF

Diogo de Lucena
000
Message
Dialogue
Subscribe

Chief Scientist at AE Studio

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
29Mistral Large 2 (123B) seems to exhibit alignment faking
5mo
0
32Reducing LLM deception at scale with self-other overlap fine-tuning
6mo
9
20Self-prediction acts as an emergent regularizer
10mo
0
59Self-Other Overlap: A Neglected Approach to AI Alignment
1y
7