AI ALIGNMENT FORUMAI Safety Subprojects
AF

AI Safety Subprojects

Sep 20, 2021 by Stuart Armstrong

There are the AI safety subprojects designed for elucidating "model splintering" and "learning the preferences of irrational agents".

17Immobile AI makes a move: anti-wireheading, ontology change, and model splintering
Stuart Armstrong
1y
3
8AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatism
Stuart Armstrong
1y
4
18AI learns betrayal and how to avoid it
Stuart Armstrong
1y
4
11Force neural nets to use models, then detect these
Stuart Armstrong
1y
6
8Preferences from (real and hypothetical) psychology papers
Stuart Armstrong
1y
0
8Finding the multiple ground truths of CoinRun and image classification
Stuart Armstrong
1y
0