Thanks for the clarification, Ajeya! Sorry to make you have to explain that, it was a mistake to imply that ARC’s conception is specifically anchored on Bayes nets–the report was quite clear that isn’t.

Curated. The authors write:

We believe that there are many promising and unexplored approaches to this problem, and there isn’t yet much reason to believe we are stuck or are faced with an insurmountable obstacle.

If it's true that that this is both a core alignment problem and we're not stuck on it, then that's fantastic. I am not an alignment researcher and don't feel qualified to comment on quite how promising this work seems, but I find the report both accessible and compelling. I recommend it to anyone curious about where some of the alignment leading edge is.

Also, I find there a striking resemblance to MIRI's proposed visible thoughts project. They appear to be getting at the same thing though via quite different models (i.e. Bayes nets vs Language models). It'd be amazing if both projects flourished and understanding could be combined from each.

Curated. A few weeks ago I curated the post this is a response to. I'm excited to see a response that argues the criticized report was misinterpreted/misrepresented. I'd be even more excited to see a response to the response–by the authors involved so far or anyone else. Someone once said (approx) that successful conversation must permit four levels: saying X, critiquing X, critiquing the critique, and critiquing the critique of the critique. We're at 3 out of 4.

The Plan

Curated. Not that many people pursue agendas to solve the whole alignment problem and of those even fewer write up their plan clearly. I really appreciate this kind of document and would love to see more like this. Shoutout to the back and forth between John and Scott Garrabrant about John's characterization of MIRI and its relation to John's work.

How uniform is the neocortex?

This post is what first gave me a major update towards "an AI with a simple single architectural pattern scaled up sufficiently could become AGI", in other words, there doesn't necessarily have to be complicated fine-tuned algorithms for different advanced functions–you can get lots of different things from the same simple structure plus optimization. Since then, as far as I can tell, that's what we've been seeing.

Biology-Inspired AGI Timelines: The Trick That Never Works

Curated. Many times over the years I've seen analogies from biology used to produce estimates about AI timelines. This is the most thoroughly-argued case I've seen against them. While I believe some find the format uncomfortable, I'm personally glad to see Eliezer expressing his beliefs as he feels them, and think this is worth reading for anyone interested in predicting how AI will play out in coming years.

For those short on time, I recommend this summary by Grant Demaree.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Curated. Although this isn't a LessWrong post, it seems like a notable result for AGI progress.  Also see this highly-upvoted, accessible explanation of why EfficientZero is a big deal. Lastly,  I recommend the discussion in the comments here.

The theory-practice gap

Curated. This post introduces a useful frame for thinking about different kinds of alignment work and related differences of opinion.

