All of Ruby's Comments + Replies

The Alignment Forum is supposed to be a very high signal-to-noise place for Alignment content, where researchers can trust that all content they read will be material they're interested in seeing (even at the expense of some false negatives).

The hard part to me now seems to be in crafting some kind of useful standard rather than one in hindsight makes us go "well that sure have everyone a false sense of security".

Curated. "Big if true". I love the depth and detail in shard theory. Separate from whether all its details are correct, I feel reading and thinking about this will get me towards a better understanding of humans and artificial networks both, if only via making reflect on how things work.

I do fear that shard theory gets a bit too much popularity from the coolness of the name, but I do think there is merit here, and if we had more theories of this scope, it'd be quite good.

Curated. It's not everyday that someone attempts to add concepts to the axioms of game theory/bargaining theory/utility theory and I'm pretty excited for where this is headed, especially if the implications are real for EA and x-risk.

When I process new posts, I add tags so they can more easily be found later. I wasn't sure what tag this one with beyond "AI". I think it'd increase the likelihood your post gets found and read later if you look through the available AI tags (you can search or use the Concepts page), or create new tags if you think none of the existing ones fit.

2Adam Jermyn1y
Thanks! I'll try do that in the future (and will add some to this).

Curated. I could imagine a world where different people pursue different agendas in a “live and let live” way, with no one waiting to be too critical of anyone else. I think that’s a world where many people could waste a lot of time with nothing prompting them to reconsider. I think posts like this one give us a chance to avoid scenarios like that. And posts like this can spur discussion of the higher-level approaches/intuitions that spawn more object-level research agenda. The top comments here by Paul Christianno, John Wentworth, and others are a great i... (read more)

Curated. Eliezer's List of Lethalities post has received an immense amount of attention, rightly so given the content, and I am extremely glad to see this response go live since Eliezer's views do not reflect a consensus, and it would be sad to have only one set of views be getting all the attention when I do think many of the questions are non-obvious. 

I am very pleased to see public back-and-forth on questions of not just "how and whether we are doomed", but the specific gears behind them (where things will work vs cannot work). These questions bear... (read more)

I'm curious about why you decided it wasn't worth your time.

Going from the post itself, the case for publishing it goes something like "the whole field of AI Alignment is failing to produce useful work because people aren't engaging with what's actually hard about the problem and are ignoring all the ways their proposals are doomed; perhaps yelling at them via this post might change some of that."

Accepting the premises (which I'm inclined to), trying to get the entire field to correct course seems actually pretty valuable, maybe even worth a month of your time, now that I think about it.

First and foremost, I have been making extraordinarily rapid progress in the last few months, though most of that is not yet publicly visible.

Second, a large part of why people pour effort into not-very-useful work is that the not-very-useful work is tractable. Useless, but at least you can make progress on the useless thing! Few people really want to work on problems which are actually Hard, so people will inevitably find excuses to do easy things instead. As Eliezer himself complains, writing the list just kicks the can down the road; six months later pe... (read more)

Curated. I think this is a message that's well worth getting out there, and a write-up of a message I find myself telling people often. As more people are interested in joining the Alignment field, I think we should establish this is a way that people can start contributing. A suggestion here is that people can further flesh out LessWrong wiki-tag pages on AI (see the concepts page), and I'd be interested in building further framework on LessWrong to enable distillation work.

Curated. I like fiction. I like that this story is fiction. I hope that all stories even at all vaguely like this one remain fiction.

Curated. Exercises are crucial for the mastery of topics and the transfer of knowledge, it's great to see someone coming up with them for the nebulous field of Alignment.

Here it is! https://www.lesswrong.com/s/4aARF2ZoBpFZAhbbe

You might want to edit the description and header image.

We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?

1Jacob Steinhardt1y
Yup! That sounds great :)

Curated. This post cleanly gets at some core disagreement in the AI [Alignment] fields, and I think does so from a more accessible frame/perspective than other posts on LessWrong and the Alignment forum. I'm hopeful that this post and others in the sequence will enable better and more productive conversations between researchers, and for that matter, just better thoughts!

1Jacob Steinhardt1y
Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?

Thanks for the clarification, Ajeya! Sorry to make you have to explain that, it was a mistake to imply that ARC’s conception is specifically anchored on Bayes nets–the report was quite clear that isn’t.

Curated. The authors write:

We believe that there are many promising and unexplored approaches to this problem, and there isn’t yet much reason to believe we are stuck or are faced with an insurmountable obstacle.

If it's true that that this is both a core alignment problem and we're not stuck on it, then that's fantastic. I am not an alignment researcher and don't feel qualified to comment on quite how promising this work seems, but I find the report both accessible and compelling. I recommend it to anyone curious about where some of the alignment leading e... (read more)

In terms of the relationship to MIRI's visible thoughts project, I'd say the main difference is that ARC is attempting to solve ELK in the worst case (where the way the AI understands the world could be arbitrarily alien from and more sophisticated than the way the human understands the world), whereas the visible thoughts project is attempting to encourage a way of developing AI that makes ELK easier to solve (by encouraging the way the AI thinks to resemble the way humans think). My understanding is MIRI is quite skeptical that a solution to worst-case ELK is possible, which is why they're aiming to do something more like "make it more likely that conditions are such that ELK-like problems can be solved in practice."

6Ajeya Cotra1y
Thanks Ruby! I'm really glad you found the report accessible. One clarification: Bayes nets aren't important to ARC's conception of the problem of ELK or its solution, so I don't think it makes sense to contrast ARC's approach against an approach focused on language models or describe it as seeking a solution via Bayes nets. The form of a solution to ELK will still involve training a machine learning model (which will certainly understand language and could just be a language model) using some loss function. The idea that this model could learn to represent its understanding of the world in the form of inference on some Bayes net is one of a few simple test cases that ARC uses to check whether the loss functions they're designing will always incentivize honestly answering straightforward questions. For example, another simple test case (not included in the report) is that the model could learn to represent its understanding of the world in a bunch of "sentences" that it performs logical operations on to transform into other sentences. These test cases are settings for counterexamples, but not crucial to proposed solutions. The idea is that if your loss function will always learn a model that answers straightforward questions honestly, it should work in particular for these simplified cases that are easy to think about.

Curated. A few weeks ago I curated the post this is a response to. I'm excited to see a response that argues the criticized report was misinterpreted/misrepresented. I'd be even more excited to see a response to the response–by the authors involved so far or anyone else. Someone once said (approx) that successful conversation must permit four levels: saying X, critiquing X, critiquing the critique, and critiquing the critique of the critique. We're at 3 out of 4.

Continuing our experiments with the voting system, I've enabled two-axis voting for this thread too.

The two dimensions are:

  • Overall (left and right arrows): what is your overall feeling about the comment? Does it contribute positively to the conversation? Do you want to see more comments like this?
  • Agreement (check and cross): do you agree with the position of this comment?

Curated. Not that many people pursue agendas to solve the whole alignment problem and of those even fewer write up their plan clearly. I really appreciate this kind of document and would love to see more like this. Shoutout to the back and forth between John and Scott Garrabrant about John's characterization of MIRI and its relation to John's work.

This post is what first gave me a major update towards "an AI with a simple single architectural pattern scaled up sufficiently could become AGI", in other words, there doesn't necessarily have to be complicated fine-tuned algorithms for different advanced functions–you can get lots of different things from the same simple structure plus optimization. Since then, as far as I can tell, that's what we've been seeing.

Curated. Many times over the years I've seen analogies from biology used to produce estimates about AI timelines. This is the most thoroughly-argued case I've seen against them. While I believe some find the format uncomfortable, I'm personally glad to see Eliezer expressing his beliefs as he feels them, and think this is worth reading for anyone interested in predicting how AI will play out in coming years.

For those short on time, I recommend this summary by Grant Demaree.

Curated. Although this isn't a LessWrong post, it seems like a notable result for AGI progress.  Also see this highly-upvoted, accessible explanation of why EfficientZero is a big deal. Lastly,  I recommend the discussion in the comments here.

Curated. This post introduces a useful frame for thinking about different kinds of alignment work and related differences of opinion.

Speaking as a moderator here, I think it's great that Aryeh is posting about this event. Probably of interest to a few people, so at least worth having as a Personal blogpost.

You're right. Maybe worth the extra words for now.

Curated. This post feels virtuous to me. I'm used to people talking about timelines in terms of X% chance of  Y by year Z; or otherwise in terms of a few macro features (GDP doubling every N months, FOOM). This post, even if most of the predictions turn out to be false, is the kind of piece that enables us to start having specific conversations about how we expect things to play out and why.  It helps me see what Daniel expects. And it's concrete enough to argue with. For that, bravo.

Low quality \\

This is a test template message. Submission declined because of quality.

Off topic \\

This is a placeholder template messages for testing. Something something content declined because off-topic.

Honestly, Pace and Pence should team up to make a super team. Nomitive similarity ought to be a Schelling feature for coordination.

2Ben Pace3y
He can be vice president.