Ruben Bloom

Team Lead for LessWrong



Call For Distillers

Curated. I think this is a message that's well worth getting out there, and a write-up of a message I find myself telling people often. As more people are interested in joining the Alignment field, I think we should establish this is a way that people can start contributing. A suggestion here is that people can further flesh out LessWrong wiki-tag pages on AI (see the concepts page), and I'd be interested in building further framework on LessWrong to enable distillation work.

It Looks Like You're Trying To Take Over The World

Curated. I like fiction. I like that this story is fiction. I hope that all stories even at all vaguely like this one remain fiction.

Alignment research exercises

Curated. Exercises are crucial for the mastery of topics and the transfer of knowledge, it's great to see someone coming up with them for the nebulous field of Alignment.

More Is Different for AI

Here it is!

You might want to edit the description and header image.

More Is Different for AI

We can also make a Sequence. I assume "More Is Different for AI" should be the title of the overall Sequence too?

More Is Different for AI

Curated. This post cleanly gets at some core disagreement in the AI [Alignment] fields, and I think does so from a more accessible frame/perspective than other posts on LessWrong and the Alignment forum. I'm hopeful that this post and others in the sequence will enable better and more productive conversations between researchers, and for that matter, just better thoughts!

ARC's first technical report: Eliciting Latent Knowledge

Thanks for the clarification, Ajeya! Sorry to make you have to explain that, it was a mistake to imply that ARC’s conception is specifically anchored on Bayes nets–the report was quite clear that isn’t.

ARC's first technical report: Eliciting Latent Knowledge

Curated. The authors write:

We believe that there are many promising and unexplored approaches to this problem, and there isn’t yet much reason to believe we are stuck or are faced with an insurmountable obstacle.

If it's true that that this is both a core alignment problem and we're not stuck on it, then that's fantastic. I am not an alignment researcher and don't feel qualified to comment on quite how promising this work seems, but I find the report both accessible and compelling. I recommend it to anyone curious about where some of the alignment leading edge is.

Also, I find there a striking resemblance to MIRI's proposed visible thoughts project. They appear to be getting at the same thing though via quite different models (i.e. Bayes nets vs Language models). It'd be amazing if both projects flourished and understanding could be combined from each.

Load More