x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Diogo de Lucena — AI Alignment Forum
Diogo de Lucena
Chief Scientist at AE Studio
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
29
Mistral Large 2 (123B) seems to exhibit alignment faking
8mo
0
32
Reducing LLM deception at scale with self-other overlap fine-tuning
9mo
9
20
Self-prediction acts as an emergent regularizer
1y
0
60
Self-Other Overlap: A Neglected Approach to AI Alignment
1y
7
Comments