AI ALIGNMENT FORUM
AF

1304
Cameron Berg
Ω86100
Message
Dialogue
Subscribe

Currently doing alignment and digital minds research @AE Studio 

Meta AI Resident '23, Cognitive science @ Yale '22, SERI MATS '21, LTFF grantee. 

Very interested in work at the intersection of AI x cognitive science x alignment x philosophy.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Reducing LLM deception at scale with self-other overlap fine-tuning
Cameron Berg7mo10

Makes sense, thanks—can you also briefly clarify what exactly you are pointing at with 'syntactic?' Seems like this could be interpreted in multiple plausible ways, and looks like others might have a similar question.

Reply
Reducing LLM deception at scale with self-other overlap fine-tuning
Cameron Berg7mo37

The idea to combine SOO and CAI is interesting. Can you elaborate at all on what you were imagining here? Seems like there are a bunch of plausible ways you could go about injecting SOO-style finetuning into standard CAI—is there a specific direction you are particularly excited about?

Reply
Paradigm-Building for AGI Safety Research
29Mistral Large 2 (123B) seems to exhibit alignment faking
7mo
0
32Reducing LLM deception at scale with self-other overlap fine-tuning
7mo
9
20Self-prediction acts as an emergent regularizer
1y
0
59Self-Other Overlap: A Neglected Approach to AI Alignment
1y
7
16There Should Be More Alignment-Driven Startups
1y
6
33Survey for alignment researchers!
2y
3
10Paradigm-building: Introduction
4y
0
25Theoretical Neuroscience For Alignment Theory
4y
7