x

AI ALIGNMENT FORUM
AF

AI Safety Subprojects — AI Alignment Forum

AI Safety Subprojects

Sep 20, 2021 by Stuart_Armstrong

There are the AI safety subprojects designed for elucidating "model splintering" and "learning the preferences of irrational agents".

17Immobile AI makes a move: anti-wireheading, ontology change, and model splintering

Stuart_Armstrong

4y

3

8AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatism

Stuart_Armstrong

4y

4

18AI learns betrayal and how to avoid it

Stuart_Armstrong

4y

4

11Force neural nets to use models, then detect these

Stuart_Armstrong

4y

6

8Preferences from (real and hypothetical) psychology papers

Stuart_Armstrong

4y

0

8Finding the multiple ground truths of CoinRun and image classification

Stuart_Armstrong

4y

0