All of Ruby's Comments + Replies

Where I agree and disagree with Eliezer

Curated. Eliezer's List of Lethalities post has received an immense amount of attention, rightly so given the content, and I am extremely glad to see this response go live since Eliezer's views do not reflect a consensus, and it would be sad to have only one set of views be getting all the attention when I do think many of the questions are non-obvious. 

I am very pleased to see public back-and-forth on questions of not just "how and whether we are doomed", but the specific gears behind them (where things will work vs cannot work). These questions bear... (read more)

AGI Ruin: A List of Lethalities

I'm curious about why you decided it wasn't worth your time.

Going from the post itself, the case for publishing it goes something like "the whole field of AI Alignment is failing to produce useful work because people aren't engaging with what's actually hard about the problem and are ignoring all the ways their proposals are doomed; perhaps yelling at them via this post might change some of that."

Accepting the premises (which I'm inclined to), trying to get the entire field to correct course seems actually pretty valuable, maybe even worth a month of your time, now that I think about it.

First and foremost, I have been making extraordinarily rapid progress in the last few months, though most of that is not yet publicly visible.

Second, a large part of why people pour effort into not-very-useful work is that the not-very-useful work is tractable. Useless, but at least you can make progress on the useless thing! Few people really want to work on problems which are actually Hard, so people will inevitably find excuses to do easy things instead. As Eliezer himself complains, writing the list just kicks the can down the road; six months later pe... (read more)

Call For Distillers

Curated. I think this is a message that's well worth getting out there, and a write-up of a message I find myself telling people often. As more people are interested in joining the Alignment field, I think we should establish this is a way that people can start contributing. A suggestion here is that people can further flesh out LessWrong wiki-tag pages on AI (see the concepts page), and I'd be interested in building further framework on LessWrong to enable distillation work.

It Looks Like You're Trying To Take Over The World

Curated. I like fiction. I like that this story is fiction. I hope that all stories even at all vaguely like this one remain fiction.

Alignment research exercises

Curated. Exercises are crucial for the mastery of topics and the transfer of knowledge, it's great to see someone coming up with them for the nebulous field of Alignment.

More Is Different for AI

Here it is!

You might want to edit the description and header image.

More Is Different for AI

We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?

1Jacob Steinhardt4mo
Yup! That sounds great :)
More Is Different for AI

Curated. This post cleanly gets at some core disagreement in the AI [Alignment] fields, and I think does so from a more accessible frame/perspective than other posts on LessWrong and the Alignment forum. I'm hopeful that this post and others in the sequence will enable better and more productive conversations between researchers, and for that matter, just better thoughts!

1Jacob Steinhardt4mo
Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?
ARC's first technical report: Eliciting Latent Knowledge

Thanks for the clarification, Ajeya! Sorry to make you have to explain that, it was a mistake to imply that ARC’s conception is specifically anchored on Bayes nets–the report was quite clear that isn’t.

ARC's first technical report: Eliciting Latent Knowledge

Curated. The authors write:

We believe that there are many promising and unexplored approaches to this problem, and there isn’t yet much reason to believe we are stuck or are faced with an insurmountable obstacle.

If it's true that that this is both a core alignment problem and we're not stuck on it, then that's fantastic. I am not an alignment researcher and don't feel qualified to comment on quite how promising this work seems, but I find the report both accessible and compelling. I recommend it to anyone curious about where some of the alignment leading e... (read more)

In terms of the relationship to MIRI's visible thoughts project, I'd say the main difference is that ARC is attempting to solve ELK in the worst case (where the way the AI understands the world could be arbitrarily alien from and more sophisticated than the way the human understands the world), whereas the visible thoughts project is attempting to encourage a way of developing AI that makes ELK easier to solve (by encouraging the way the AI thinks to resemble the way humans think). My understanding is MIRI is quite skeptical that a solution to worst-case ELK is possible, which is why they're aiming to do something more like "make it more likely that conditions are such that ELK-like problems can be solved in practice."

6Ajeya Cotra6mo
Thanks Ruby! I'm really glad you found the report accessible. One clarification: Bayes nets aren't important to ARC's conception of the problem of ELK or its solution, so I don't think it makes sense to contrast ARC's approach against an approach focused on language models or describe it as seeking a solution via Bayes nets. The form of a solution to ELK will still involve training a machine learning model (which will certainly understand language and could just be a language model) using some loss function. The idea that this model could learn to represent its understanding of the world in the form of inference on some Bayes net is one of a few simple test cases that ARC uses to check whether the loss functions they're designing will always incentivize honestly answering straightforward questions. For example, another simple test case (not included in the report) is that the model could learn to represent its understanding of the world in a bunch of "sentences" that it performs logical operations on to transform into other sentences. These test cases are settings for counterexamples, but not crucial to proposed solutions. The idea is that if your loss function will always learn a model that answers straightforward questions honestly, it should work in particular for these simplified cases that are easy to think about.
Reply to Eliezer on Biological Anchors

Curated. A few weeks ago I curated the post this is a response to. I'm excited to see a response that argues the criticized report was misinterpreted/misrepresented. I'd be even more excited to see a response to the response–by the authors involved so far or anyone else. Someone once said (approx) that successful conversation must permit four levels: saying X, critiquing X, critiquing the critique, and critiquing the critique of the critique. We're at 3 out of 4.

Reply to Eliezer on Biological Anchors

Continuing our experiments with the voting system, I've enabled two-axis voting for this thread too.

The two dimensions are:

  • Overall (left and right arrows): what is your overall feeling about the comment? Does it contribute positively to the conversation? Do you want to see more comments like this?
  • Agreement (check and cross): do you agree with the position of this comment?
The Plan

Curated. Not that many people pursue agendas to solve the whole alignment problem and of those even fewer write up their plan clearly. I really appreciate this kind of document and would love to see more like this. Shoutout to the back and forth between John and Scott Garrabrant about John's characterization of MIRI and its relation to John's work.

How uniform is the neocortex?

This post is what first gave me a major update towards "an AI with a simple single architectural pattern scaled up sufficiently could become AGI", in other words, there doesn't necessarily have to be complicated fine-tuned algorithms for different advanced functions–you can get lots of different things from the same simple structure plus optimization. Since then, as far as I can tell, that's what we've been seeing.

Biology-Inspired AGI Timelines: The Trick That Never Works

Curated. Many times over the years I've seen analogies from biology used to produce estimates about AI timelines. This is the most thoroughly-argued case I've seen against them. While I believe some find the format uncomfortable, I'm personally glad to see Eliezer expressing his beliefs as he feels them, and think this is worth reading for anyone interested in predicting how AI will play out in coming years.

For those short on time, I recommend this summary by Grant Demaree.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Curated. Although this isn't a LessWrong post, it seems like a notable result for AGI progress.  Also see this highly-upvoted, accessible explanation of why EfficientZero is a big deal. Lastly,  I recommend the discussion in the comments here.

The theory-practice gap

Curated. This post introduces a useful frame for thinking about different kinds of alignment work and related differences of opinion.

[External Event] 2022 IEEE Conference on Assured Autonomy (ICAA)

Speaking as a moderator here, I think it's great that Aryeh is posting about this event. Probably of interest to a few people, so at least worth having as a Personal blogpost.

Welcome & FAQ!

You're right. Maybe worth the extra words for now.

What 2026 looks like

Curated. This post feels virtuous to me. I'm used to people talking about timelines in terms of X% chance of  Y by year Z; or otherwise in terms of a few macro features (GDP doubling every N months, FOOM). This post, even if most of the predictions turn out to be false, is the kind of piece that enables us to start having specific conversations about how we expect things to play out and why.  It helps me see what Daniel expects. And it's concrete enough to argue with. For that, bravo.

AF Default Moderator Responses

Low quality \\

This is a test template message. Submission declined because of quality.

AF Default Moderator Responses

Off topic \\

This is a placeholder template messages for testing. Something something content declined because off-topic.

Search versus design

Honestly, Pace and Pence should team up to make a super team. Nomitive similarity ought to be a Schelling feature for coordination.

2Ben Pace2y
He can be vice president.