AI ALIGNMENT FORUM
AF

1560
Diogo de Lucena
000
Message
Dialogue
Subscribe

Chief Scientist at AE Studio

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
29Mistral Large 2 (123B) seems to exhibit alignment faking
7mo
0
32Reducing LLM deception at scale with self-other overlap fine-tuning
7mo
9
20Self-prediction acts as an emergent regularizer
1y
0
59Self-Other Overlap: A Neglected Approach to AI Alignment
1y
7