AI ALIGNMENT FORUM
AF

Marc Carauleanu

Ω134010

AI Safety Researcher @AE Studio

Currently researching a neglected prior for cooperation and honesty inspired by the cognitive neuroscience of altruism called self-other overlap in state-of-the-art ML models.

Previous SRF @ SERI 21', MLSS & Student Researcher @ CAIS 22' and LTFF grantee.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Should we rely on the speed prior for safety?

Marc Carauleanu4y00

Any n-bit hash function will produce collisions when the number of elements in the hash table gets large enough (after the number of possible hashes stored in n bits has been reached) so adding new elements will require rehashing to avoid collisions making GLUT have a logarithmic time complexity in the limit. Meta-learning can also have a constant time complexity for an arbitrarily large number of tasks, but not in the limit, assuming a finite neural network.

No wikitag contributions to display.

29Mistral Large 2 (123B) seems to exhibit alignment faking

6mo

0

32Reducing LLM deception at scale with self-other overlap fine-tuning

6mo

9

59Self-Other Overlap: A Neglected Approach to AI Alignment

1y

7

10Should we rely on the speed prior for safety?

4y

4