AI ALIGNMENT FORUM
AF

Value Learning
Frontpage

17

Other versions of "No free lunch in value learning"

by Stuart Armstrong
25th Feb 2020
1 min read
0

17

Value Learning
Frontpage
New Comment
Moderation Log
Curated and popular this week
0Comments

Here are some more general results derived from "Occam's razor is insufficient to infer the preferences of irrational agents":

  • Regularisation is insufficient to make inverse reinforcement learning work.
  • Unsupervised learning cannot deduce human preferences; at the very least, you need semi-supervised learning.
  • Human theory of mind cannot be deduced merely be observing humans.
  • When programmers "correct an error/bug" in a value-learning system, they are often injecting their own preferences into it.
  • The implicit and explicit assumptions made for a value-learning system, determine what values the system will learn.
  • No simple definition can distinguish a bias from a preference, unless it connects with human judgement.

These are all true; the key question is to what extent they are true. Do we need to have minimal supervision, or add minimal assumptions, to get the AI to deduce our values correctly? After all, we do produce a lot of data, that it could use to learn our values, if it gets the basic assumptions right.

Or do we need to put in a lot more work in the assumptions? After all, most of the data we produce is by humans, for humans, who will accept the implicit part of the data, hence there are few explicit labels for the AI to use.

My feeling is that we probably only need a few assumptions - but maybe more than some optimists in this area believe.

Mentioned in
41Learning Normativity: A Research Agenda
23Four Motivations for Learning Normativity
10Deconfusing Human Values Research Agenda v1
8Preferences from (real and hypothetical) psychology papers
2Comparing AI Alignment Approaches to Minimize False Positive Risk