AI ALIGNMENT FORUM
AF

AI Safety Subprojects

Sep 20, 2021 by Stuart_Armstrong

There are the AI safety subprojects designed for elucidating "model splintering" and "learning the preferences of irrational agents".

17Immobile AI makes a move: anti-wireheading, ontology change, and model splintering
Stuart_Armstrong
4y
3
8AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatism
Stuart_Armstrong
4y
4
18AI learns betrayal and how to avoid it
Stuart_Armstrong
4y
4
11Force neural nets to use models, then detect these
Stuart_Armstrong
4y
6
8Preferences from (real and hypothetical) psychology papers
Stuart_Armstrong
4y
0
8Finding the multiple ground truths of CoinRun and image classification
Stuart_Armstrong
4y
0