(note: this is not an official MIRI statement, this is a personal statement. I am not speaking for others who have been involved with the agenda.)
The AAMLS (Alignment for Advanced Machine Learning Systems) agenda is a project at MIRI that is about determining how to use hypothetical highly advanced machine learning systems safely. I was previously working on problems in this agenda and am currently not.
See the paper. The agenda lists 8 theoretical problems relevant to
aligning AI systems substantially similar to current machine learning systems.
Around March 2016, I had thoughts about research prioritization: I thought it
made sense to AI safety researchers spend more time thinking about machine learning systems.
In a similar timeframe, some other researchers updated towards shorter timelines. I had some
discussions with Eliezer, Paul, Nate, and others, and
came up with a list of problems that seemed useful to think about.
Then some of us (mostly me, with significant help from others) wrote up the paper about the problems. The plan was
for some subset of the researchers to work on them.
Since writing the paper, progress has been slow:
Why was little progress made?
I think the main reason is that the problems were
very difficult. In particular, they were mostly selected on the basis of "this
seems important and seems plausibly solveable", rather than any strong
intuition that it's possible to make progress.
In comparison, problems in the agent foundations agenda have seen more progress:
One thing to note about these problems is that they were formulated on the basis of
a strong intuition that they ought to be solveable. Before logical induction,
it was possible to have the intuition that some sort of asymptotic approach
could solve many logical uncertainty problems in the limit. It was also
possible to strongly think that some sort of self-trust is possible.
With problems in the AAMLS agenda, the plausibility argument was something like:
which, empirically, turned out not to make for tractable research problems.
In an important sense, the AAMLS agenda is "going for the throat" in a way that
other agendas (e.g. the agent foundations agenda) are to a lesser extent: it
is attempting to solve the whole alignment problem (including goal
specification) given access to resources such as powerful reinforcement
learning. Thus, the difficulties of the whole alignment problem (e.g.
specification of environmental goals) are more exposed in the problems.
Personally, I strongly lean towards preferring theoretical rather than empirical
approaches. I don't know how much I endorse this bias overall for the set of
people working on AI safety as a whole, but it is definitely a personal bias of
Problems in the AAMLS agenda turned out not to be very amenable to
purely-theoretical investigation. This is probably due to the fact that there
is not a clear mathematical aesthetic for determining what counts as a solution
(e.g. for the environmental goals problem, it's not actually clear that there's
a recognizable mathematical statement for what the problem is).
With the agent foundations agenda, there's a
clearer aesthetic for recognizing good solutions. Most of the problems in the
AAMLS agenda have a less-clear aesthetic. (There are probably additional ways of
investigating the AI alignment problem in a highly aesthetic fashion other than
the agent foundations agenda, but I don't know of them yet).
Perhaps related to the fact that the problems were so hard, I repeatedly found
other things to feel better to think about and work on than AAMLS:
That is, though I was officially lead on AAMLS, I mostly did other things in
that time period. I think this was mostly correct (though unfortunately made
the official story somewhat misleading): I intuitively expect that the other
things I did had a greater payoff than working on AAMLS would have.
I've made some updates (some due to AAMLS, some not) that make AAMLS look like a
worse idea now than before.
As discussed before, I included problems based on plausibility rather than a
strong intuition that the problem is solveable. I've updated against this being
a useful research strategy; I think strong intuitions about things being
solveable is a better guide as to what to work on. Note that strong intuitions
can be miscalibrated; however, even in these cases there is still a strong model behind the
intuition that can be tested by pursuing the research implied by the intutiion.
I've updated in favor of the proposition that essential AI safety problems
(especially those related to benign induction, bounded logical uncertainty, and
environmental goals) are philosophically hard rather than only
mathematically hard. That is: just taking our current philosophical thinking
and attempting to formalize it will fail, because our current philosophical
thinking is confused.
The main reason for this intuition is thinking about these problems for a significant time and then
noticing that, in near mode, I don't expect to be able to find satisfying
solutions (e.g. a particular thing and a mathematical proof related to the thing
that yields high confidence it will work; it's hard to imagine what the premises or
conclusions of the mathematical proof would be).
So it looks like large
ontological shifts will be necessary to even get to the stage of picking the
right problems to formalize and solve.
I've moved towards a research approach that is less "rigid" than
working on a particular agenda. Every particular research agenda for AI alignment that
I know of (agent foundations, AAMLS, concrete problems in AI safety, Paul's
agenda) offers a useful perspective on the problem, but is quite limited in
itself. Each agenda does some combination of (a) containing "impossible"
problems, or (b) ignoring large parts of the AI safety problem. If the overall
alignment problem is solved, it will probably be solved through researchers obtaining
new, not-currently-existing perspectives on the problem.
In general I think the purpose of technical agendas is something like:
I've updated against the idea that research should be significantly optimized
for being understandable to outsiders. (I previously considered
understandability a significant point in favor of working on AAMLS but not one
main considerations). The intuitions in favor of this type of research are
I now have additional intuitions against:
Overall it still seems like outside understandability is weakly net-positive, but I don't plan to use it as a significant optimization criterion when deciding which research to do (i.e. I'll aim to just do research good according to my aesthetics and then figure out how to make it understandable later).
The "benign induction problem" link is broken.