x

AI ALIGNMENT FORUM
AF

Marc Carauleanu — AI Alignment Forum

Marc Carauleanu

Ω135010

AI Safety Researcher @AE Studio

Currently researching a neglected prior for cooperation and honesty inspired by the cognitive neuroscience of altruism called self-other overlap in state-of-the-art ML models.

Previous SRF @ SERI 21', MLSS & Student Researcher @ CAIS 22' and LTFF grantee.

LinkedIn

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No wikitag contributions to display.

Should we rely on the speed prior for safety?

Marc Carauleanu4y00

Any n-bit hash function will produce collisions when the number of elements in the hash table gets large enough (after the number of possible hashes stored in n bits has been reached) so adding new elements will require rehashing to avoid collisions making GLUT have a logarithmic time complexity in the limit. Meta-learning can also have a constant time complexity for an arbitrarily large number of tasks, but not in the limit, assuming a finite neural network.

29Mistral Large 2 (123B) seems to exhibit alignment faking

8mo

0

32Reducing LLM deception at scale with self-other overlap fine-tuning

9mo

9

60Self-Other Overlap: A Neglected Approach to AI Alignment

1y

7

10Should we rely on the speed prior for safety?

4y

4