This post is excellent, in that it has a very high importance-to-word-count ratio. It'll take up only a page or so, but convey a very useful and relevant idea, and moreover ask an important question that will hopefully stimulate further thought.
If this post is selected, I'd like to see the followup made into an addendum—I think it adds a very important piece, and it should have been nominated itself.
I think this post (and similarly, Evan's summary of Chris Olah's views) are essential both in their own right and as mutual foils to MIRI's research agenda. We see related concepts (mesa-optimization originally came out of Paul's talk of daemons in Solomonoff induction, if I remember right) but very different strategies for achieving both inner and outer alignment. (The crux of the disagreement seems to be the probability of success from adapting current methods.)
Strongly recommended for inclusion.
It's hard to know how to judge a post that deems itself superseded by a post from a later year, but I lean toward taking Daniel at his word and hoping we survive until the 2021 Review comes around.
The content here is very valuable, even if the genre of "I talked a lot with X and here's my articulation of X's model" comes across to me as a weird intellectual ghostwriting. I can't think of a way around that, though.
I think I have juuust enough background to follow the broad strokes of this post, but not to quite grok the parts I think Abram was most interested in.
I definitely caused me to think about credit assignment. I actually ended up thinking about it largely through the lens of Moral Mazes (where challenges of credit assignment combine with other forces to create a really bad environment). Re-reading this post, while I don't quite follow everything, I do successfully get a taste of how credit assignment fits into a bunch of different domains.
For the "myop... (read more)
For me, this is the paper where I learned to connect ideas about delegation to machine learning. The paper sets up simple ideas of mesa-optimizers, and shows a number of constraints and variables that will determine how the mesa-optimizers will be developed – in some environments you want to do a lot of thinking in advance then delegate execution of a very simple algorithm to do your work (e.g. this simple algorithm Critch developed that my group house uses to decide on the rent for each room), and in some environments you want to do a little thinking and ... (read more)
Note 1: This review is also a top-level post.
Note 2: I think that 'robust instrumentality' is a more apt name for 'instrumental convergence.' That said, for backwards compatibility, this comment often uses the latter.
In the summer of 2019, I was building up a corpus of basic reinforcement learning theory. I wandered through a sun-dappled Berkeley, my head in the clouds, my mind bent on a single ambition: proving the existence of instrumental convergence.
I needed to find the right definitions first, and I couldn't even imagine what... (read more)
More than a year since writing this post, I would still say it represents the key ideas in the sequence on mesa-optimisation which remain central in today's conversations on mesa-optimisation. I still largely stand by what I wrote, and recommend this post as a complement to that sequence for two reasons:
First, skipping some detail allows it to focus on the important points, making it better-suited than the full sequence for obtaining an overview of the area.
Second, unlike the sequence, it deemphasises the mechanism of optimisation, and explicitly cas... (read more)
I have now linked at least 10 times to the heading on "'Generate evidence of difficulty' as a research purpose" section of this post. It was a thing that I kind of wanted to point to before this post came out, but felt confused about it, and this post finally gave me a pointer to it.
I think that section was substantially more novel and valuable to me than the rest of this post, but it is also evidence that others might have also not had some of the other ideas on their map, and so they might found it similarly valuable because of a different section.
So, this was apparently in 2019. Given how central the ideas have become, it definitely belongs in the review.
This is perhaps the most striking fundamental discoveries of machine learning in the past 20 years, and Evan's post is well-deserving of a nomination for explaining it to LW.
This post is a great tutorial on how to run a research group.
My main complain about it is that it had the potential to be a way more general post that was obviously relevant to anyone building a serious intellectual community, but the framing makes it feel only relevant to Alignment research.
I think that this post is a good description of a way of thinking about the usefulness of transparency and interpretability for AI alignment that I think is underrated by the LW-y AI safety community.
I think this post was a valuable contribution both to our understanding of instrumental convergence as well as making instrumental convergence rigorous enough to stand up to more intense outside scrutiny.
A formal answer (or at least, major contribution) to a previously informal but significant debate in AI safety.
I know it’s already been nominated twice, but I still want to nominate it again. This sequence (I’m nominating the sequence) helped me think clearly about optimization, and how delegation works between an optimizer and mesa-optimizer, and what constraints lie between them (e.g. when does an optimizer want a system it’s developing to do optimization?). Changed a lot of the basic ways in which I think about optimization and AI.
This is a rather clever parable which explains serious AI alignment problems in an entertaining form that doesn't detract from the substance.
This is one of the scarier posts I've read on LW. I feel kinda freaked out by this post. It's an important technical idea.
Seconding Neel Nanda's nomination.