Oracle design as de-black-boxer.

by Stuart Armstrong1 min read2nd Sep 2016No comments


Oracle AI
Personal Blog

This paper introduces an interpreter for learning algorithms, for the purpose of clarifying what is happening inside the algorithm.

The interpreter, Local Interpretable Model-agnostic Explanations (LIME), gives the human user some idea of the important factors going into the learning algorithm's decision, such as:

We can use the first Oracle design here for similar purposes, though for giving clarity. The "Counterfactually Unread Agent" can already be used to see what values of a specific random variable will maximise a certain utility function. We could also search across a whole slew of random variables, to see which one is most important in maximising the given utility function, giving outputs like this:

New Comment