Oracle design as de-black-boxer.

Stuart_Armstrong

0 Oracle design as de-black-boxer.

2nd Sep 2016

1 min read

0

This paper introduces an interpreter for learning algorithms, for the purpose of clarifying what is happening inside the algorithm.

The interpreter, Local Interpretable Model-agnostic Explanations (LIME), gives the human user some idea of the important factors going into the learning algorithm's decision, such as:

We can use the first Oracle design here for similar purposes, though for giving clarity. The "Counterfactually Unread Agent" can already be used to see what values of a specific random variable will maximise a certain utility function. We could also search across a whole slew of random variables, to see which one is most important in maximising the given utility function, giving outputs like this:

Oracle AI

Personal Blog

New Comment

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

0

Oracle design as de-black-boxer.

0