# 14

### TLDR

We wrote a 20-page document that explains IDA and outlines potential Machine Learning projects about IDA. This post gives an overview of the document.

### What is IDA?

Iterated Distillation and Amplification (IDA) is a method for training ML systems to solve challenging tasks. It was introduced by Paul Christiano. IDA is intended for tasks where:

• The goal is to outperform humans at the task or to solve instances that are too hard for humans.

• It is not feasible to provide demonstrations or reward signals sufficient for super-human performance at the task

• Humans have a high-level understanding of how to approach the task and can reliably solve easy instances.

The idea behind IDA is to bootstrap using an approach similar to AlphaZero, but with a learned model of steps of human reasoning instead of the fixed game simulator.

Our document provides a self-contained technical description of IDA. For broader discussion of IDA and its relevance to value alignment, see Ought's presentation, Christiano's blogpost, and the Debate paper. There is also a technical ML paper applying IDA to algorithmic problems (e.g. shortest path in a graph).

### ML Projects on IDA

Our document outlines three Machine Learning projects on IDA. Our goal in outlining these projects is to generate discussion and encourage research on IDA. We are not (as of June 2019) working on these projects, but we are interested in collaboration. The project descriptions are “high-level” and leave many choices undetermined. If you took on a project, part of the work would be refining the project and fixing a concrete objective, dataset and model.

#### Project 1: Amplifying Mathematical Reasoning

This project is about applying IDA to problems in mathematics. This would involve learning to solve math problems by breaking them down into easier sub-problems. The problems could be represented in a formal language (as in this paper) or in natural language. We discuss a recent dataset of high-school problems in natural language, which was introduced in this paper. Here are some examples from the dataset:

Question: Let u(n) = -n^3 - n^2. Let e(c) = -2*c^3 + c. Let f(j) = -118*e(j) + 54*u(j). What is the derivative of f(a)?

Answer: 546*a^2 - 108*a - 118

Question: Three letters picked without replacement from qqqkkklkqkkk. Give probability of sequence qql.

The paper showed impressive results on the dataset for a Transformer model trained by supervised learning (sequence-to-sequence). This suggests that a similar model could do well at learning to solve these problems by decomposition.

#### Project 2: IDA for Neural Program Interpretation

There’s a research program in Machine Learning on “Neural Program Interpretation” (NPI). Work on NPI focuses on learning to reproduce the behavior of computer programs. One possible approach is to train end-to-end on input-output behavior. However in NPI, a model is trained to mimic the program’s internal behavior, including all the low-level operations and the high-level procedures which invoke them.

NPI has some similar motivations to IDA. This project applies IDA to the kinds of tasks explored in NPI and compares IDA to existing approaches. Tasks could include standard algorithms (e.g. sorting), algorithms that operate with databases, and algorithms that operate on human-readable inputs (e.g. text, images).

The idea of “adaptive computation” is to vary the amount of computation you perform for different inputs. You want to apply more computation to inputs that are hard but solvable.

Adaptive computation seems important for the kinds of problems IDA is intended to solve, including some of the problems in Projects 1 and 2. This project would investigate different approaches to adaptive computation for IDA. The basic idea is to decide whether to rely only on the distilled model (which is fast but approximate) or to additionally use amplification (which is more accurate but slower). This decision could be based on a calibrated model or based on a learned policy for choosing whether to use amplification.

New Comment
1 comment, sorted by Click to highlight new comments since:

Regarding the following passage from the document:

What kind of built-in operations and environments should we use?
In existing work on NPI, the neural net is given outputs that correspond to basic operations on data. This makes it easier to learn algorithms that depend on those basic operations. For IDA, it would be ideal to learn these operations from examples. (If we were learning from human decompositions, we might not know about these “basic operations on data” ahead of time).

Do you have ideas/intuitions about how "basic operations" in the human brain can be learned? Also, how basic are the "basic operations" you're thinking about here? (Are we talking about something like the activity of an individual biological neuron? How active is a particular area in the prefrontal cortex? Symbolic-level stuff?)

Generally, do you consider imitating human cognition at the level of "basic operations" to be part of the IDA agenda? (As opposed to, say, training a model to "directly" predict the output of a human-working-for-10-minutes).