All of Ruby's Comments + Replies

ARC's first technical report: Eliciting Latent Knowledge

Thanks for the clarification, Ajeya! Sorry to make you have to explain that, it was a mistake to imply that ARC’s conception is specifically anchored on Bayes nets–the report was quite clear that isn’t.

ARC's first technical report: Eliciting Latent Knowledge

Curated. The authors write:

We believe that there are many promising and unexplored approaches to this problem, and there isn’t yet much reason to believe we are stuck or are faced with an insurmountable obstacle.

If it's true that that this is both a core alignment problem and we're not stuck on it, then that's fantastic. I am not an alignment researcher and don't feel qualified to comment on quite how promising this work seems, but I find the report both accessible and compelling. I recommend it to anyone curious about where some of the alignment leading e... (read more)

In terms of the relationship to MIRI's visible thoughts project, I'd say the main difference is that ARC is attempting to solve ELK in the worst case (where the way the AI understands the world could be arbitrarily alien from and more sophisticated than the way the human understands the world), whereas the visible thoughts project is attempting to encourage a way of developing AI that makes ELK easier to solve (by encouraging the way the AI thinks to resemble the way humans think). My understanding is MIRI is quite skeptical that a solution to worst-case ELK is possible, which is why they're aiming to do something more like "make it more likely that conditions are such that ELK-like problems can be solved in practice."

6Ajeya Cotra23dThanks Ruby! I'm really glad you found the report accessible. One clarification: Bayes nets aren't important to ARC's conception of the problem of ELK or its solution, so I don't think it makes sense to contrast ARC's approach against an approach focused on language models or describe it as seeking a solution via Bayes nets. The form of a solution to ELK will still involve training a machine learning model (which will certainly understand language and could just be a language model) using some loss function. The idea that this model could learn to represent its understanding of the world in the form of inference on some Bayes net is one of a few simple test cases that ARC uses to check whether the loss functions they're designing will always incentivize honestly answering straightforward questions. For example, another simple test case (not included in the report) is that the model could learn to represent its understanding of the world in a bunch of "sentences" that it performs logical operations on to transform into other sentences. These test cases are settings for counterexamples, but not crucial to proposed solutions. The idea is that if your loss function will always learn a model that answers straightforward questions honestly, it should work in particular for these simplified cases that are easy to think about.
Reply to Eliezer on Biological Anchors

Curated. A few weeks ago I curated the post this is a response to. I'm excited to see a response that argues the criticized report was misinterpreted/misrepresented. I'd be even more excited to see a response to the response–by the authors involved so far or anyone else. Someone once said (approx) that successful conversation must permit four levels: saying X, critiquing X, critiquing the critique, and critiquing the critique of the critique. We're at 3 out of 4.

Reply to Eliezer on Biological Anchors

Continuing our experiments with the voting system, I've enabled two-axis voting for this thread too.

The two dimensions are:

  • Overall (left and right arrows): what is your overall feeling about the comment? Does it contribute positively to the conversation? Do you want to see more comments like this?
  • Agreement (check and cross): do you agree with the position of this comment?
The Plan

Curated. Not that many people pursue agendas to solve the whole alignment problem and of those even fewer write up their plan clearly. I really appreciate this kind of document and would love to see more like this. Shoutout to the back and forth between John and Scott Garrabrant about John's characterization of MIRI and its relation to John's work.

How uniform is the neocortex?

This post is what first gave me a major update towards "an AI with a simple single architectural pattern scaled up sufficiently could become AGI", in other words, there doesn't necessarily have to be complicated fine-tuned algorithms for different advanced functions–you can get lots of different things from the same simple structure plus optimization. Since then, as far as I can tell, that's what we've been seeing.

Biology-Inspired AGI Timelines: The Trick That Never Works

Curated. Many times over the years I've seen analogies from biology used to produce estimates about AI timelines. This is the most thoroughly-argued case I've seen against them. While I believe some find the format uncomfortable, I'm personally glad to see Eliezer expressing his beliefs as he feels them, and think this is worth reading for anyone interested in predicting how AI will play out in coming years.

For those short on time, I recommend this summary by Grant Demaree.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Curated. Although this isn't a LessWrong post, it seems like a notable result for AGI progress.  Also see this highly-upvoted, accessible explanation of why EfficientZero is a big deal. Lastly,  I recommend the discussion in the comments here.

The theory-practice gap

Curated. This post introduces a useful frame for thinking about different kinds of alignment work and related differences of opinion.

[External Event] 2022 IEEE Conference on Assured Autonomy (ICAA)

Speaking as a moderator here, I think it's great that Aryeh is posting about this event. Probably of interest to a few people, so at least worth having as a Personal blogpost.

Welcome & FAQ!

You're right. Maybe worth the extra words for now.

What 2026 looks like

Curated. This post feels virtuous to me. I'm used to people talking about timelines in terms of X% chance of  Y by year Z; or otherwise in terms of a few macro features (GDP doubling every N months, FOOM). This post, even if most of the predictions turn out to be false, is the kind of piece that enables us to start having specific conversations about how we expect things to play out and why.  It helps me see what Daniel expects. And it's concrete enough to argue with. For that, bravo.

AF Default Moderator Responses

Low quality \\

This is a test template message. Submission declined because of quality.

AF Default Moderator Responses

Off topic \\

This is a placeholder template messages for testing. Something something content declined because off-topic.

Search versus design

Honestly, Pace and Pence should team up to make a super team. Nomitive similarity ought to be a Schelling feature for coordination.

2Ben Pace1yHe can be vice president.