2 Statistical learning theory and robust concept learning

by jessicata

7th Mar 2015

4 min read

6

2

Personal Blog

New Comment

6 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:44 PM

[-]Stuart_Armstrong11y00

This is interesting. What would be useful, I feel, would be to take a few hypothetical failure examples (a papercliper that optimises the universe, a "cure cancer" AI that kills everyone, a spam filter that shuts down the internet), and see how exactly they fail in this setup. Be careful not to feed in the failure by hand. Then we could see if this can be generalised (ie if given a random new AI design, rapidly detect the likely failure points).

My suspicion is that the choice of $H$ is doing a huge amount of the work here.

Reply

[-]jessicata11y00

Yeah, I think the main problem with active learning is that $H$ is either hard to split evenly (with individual $x$ points) or is not very general. If we somehow created a nice $H$ that is both sufficiently general and easy to split, then we might get exchanges like this (but starting with more than 4 hypotheses, of course):

(AI's current hypotheses: cancer is good, cancer is bad, QALYs are good, QALYs are bad)
AI: Is removing a tumor good?
Programmer: Yes.
(AI's current hypotheses: cancer is bad, QALYs are good)
AI: Is killing everyone good?
Programmer: No.
(AI's only hypothesis: QALYs are good)

Obviously this works very badly if $H$ does not contain the hypothesis we really want! Luckily, as long as $H$ does contain the correct hypothesis, and we don't accidentally falsify it, then the system will either determine the correct hypothesis or fail gracefully by reporting that it is uncertain.

Reply

[-]Stuart_Armstrong11y00

"killing everyone" seems a very high level and ambiguous concept.

Reply

[-]jessicata11y00

Certainly. This is why any use of concept learning gets into ontology identification issues.

Reply

[-]Stuart_Armstrong11y00

Can concept learning help effectively at that level?

Reply

[-]jessicata11y00

I think you might be able to use concept learning to extract humans' native ontology (of the type studied in the ontological crisis paper) and values expressed in this ontology. The next step is to make a more rational version of this ontology (e.g. by mapping it to the AI's ontology), which does not look like a concept learning problem.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

2

Statistical learning theory and robust concept learning

2