Ruby - AI Alignment Forum

The Alignment Forum is supposed to be a very high signal-to-noise place for Alignment content, where researchers can trust that all content they read will be material they're interested in seeing (even at the expense of some false negatives).

evhub's Shortform

Ruben Bloom1y90

The hard part to me now seems to be in crafting some kind of useful standard rather than one in hindsight makes us go "well that sure have everyone a false sense of security".

The shard theory of human values

Ruben Bloom2y104

Curated. "Big if true". I love the depth and detail in shard theory. Separate from whether all its details are correct, I feel reading and thinking about this will get me towards a better understanding of humans and artificial networks both, if only via making reflect on how things work.

I do fear that shard theory gets a bit too much popularity from the coolness of the name, but I do think there is merit here, and if we had more theories of this scope, it'd be quite good.

«Boundaries», Part 1: a key missing concept from utility theory

Ruben Bloom2y66

Curated. It's not everyday that someone attempts to add concepts to the axioms of game theory/bargaining theory/utility theory and I'm pretty excited for where this is headed, especially if the implications are real for EA and x-risk.

Conditioning Generative Models with Restrictions

Ruben Bloom2y20

When I process new posts, I add tags so they can more easily be found later. I wasn't sure what tag this one with beyond "AI". I think it'd increase the likelihood your post gets found and read later if you look through the available AI tags (you can search or use the Concepts page), or create new tags if you think none of the existing ones fit.

On how various plans miss the hard bits of the alignment challenge

Ruben Bloom2y44

Curated. I could imagine a world where different people pursue different agendas in a “live and let live” way, with no one waiting to be too critical of anyone else. I think that’s a world where many people could waste a lot of time with nothing prompting them to reconsider. I think posts like this one give us a chance to avoid scenarios like that. And posts like this can spur discussion of the higher-level approaches/intuitions that spawn more object-level research agenda. The top comments here by Paul Christianno, John Wentworth, and others are a great instance of this.

I also kind of like how this just further develops my gears-level understanding of why Nate predicts doom. There’s color here beyond AGI Ruin: List of Lethalities, which I assume captured most of Nate’s pessimism, but in fact I wonder if Nate disagrees with Eliezer and thinks things would be a bunch more hopeful if only people worked on the right stuff (in contrast with the problem is too hard for our civilization).

Lastly I’ll note that I think it’s good that Nate wrote this post even before being confident he could pass other people’s ITT. I’m glad he felt it was okay to be critical (with caveats) even before his criticisms were maximally defensible (e.g. because he thinks he could pass an ITT).

Where I agree and disagree with Eliezer

Ruben Bloom2y718

Curated. Eliezer's List of Lethalities post has received an immense amount of attention, rightly so given the content, and I am extremely glad to see this response go live since Eliezer's views do not reflect a consensus, and it would be sad to have only one set of views be getting all the attention when I do think many of the questions are non-obvious.

I am very pleased to see public back-and-forth on questions of not just "how and whether we are doomed", but the specific gears behind them (where things will work vs cannot work). These questions bear on the enormous resources poured into AI safety work right now. Ensuring those resources get allocated in a way that actually the improve odds of our success is key.

I hope that others continue to share and debate their models of the world, Alignment, strategy, etc. in a way that is both on record and easily findable by others. Hopefully, we can look back in 10, 20, 50, etc years and reflect on how well we reasoned in these cloudy times.

AGI Ruin: A List of Lethalities

Ruben Bloom2y21

I'm curious about why you decided it wasn't worth your time.

Going from the post itself, the case for publishing it goes something like "the whole field of AI Alignment is failing to produce useful work because people aren't engaging with what's actually hard about the problem and are ignoring all the ways their proposals are doomed; perhaps yelling at them via this post might change some of that."

Accepting the premises (which I'm inclined to), trying to get the entire field to correct course seems actually pretty valuable, maybe even worth a month of your time, now that I think about it.

Call For Distillers

Ruben Bloom2y40

Curated. I think this is a message that's well worth getting out there, and a write-up of a message I find myself telling people often. As more people are interested in joining the Alignment field, I think we should establish this is a way that people can start contributing. A suggestion here is that people can further flesh out LessWrong wiki-tag pages on AI (see the concepts page), and I'd be interested in building further framework on LessWrong to enable distillation work.

It Looks Like You're Trying To Take Over The World

Ruben Bloom2y30

Curated. I like fiction. I like that this story is fiction. I hope that all stories even at all vaguely like this one remain fiction.

AI ALIGNMENT FORUM
AF

Sequences

Posts

Wiki Contributions

Comments