Oracle design as de-black-boxer.

by Stuart Armstrong1 min read2nd Sep 2016No comments

0

Oracle AI
Personal Blog

This paper introduces an interpreter for learning algorithms, for the purpose of clarifying what is happening inside the algorithm.

The interpreter, Local Interpretable Model-agnostic Explanations (LIME), gives the human user some idea of the important factors going into the learning algorithm's decision, such as:

We can use the first Oracle design here for similar purposes, though for giving clarity. The "Counterfactually Unread Agent" can already be used to see what values of a specific random variable will maximise a certain utility function. We could also search across a whole slew of random variables, to see which one is most important in maximising the given utility function, giving outputs like this:

New Comment