Welcome & FAQ!

Ruby; habryka

The AI Alignment Forum was launched in 2018. Since then, several hundred researchers have contributed approximately two thousand posts and nine thousand comments. Nearing the third birthday of the Forum, we are publishing this updated and clarified FAQ.

*Minimalist, watercolor sketch of humanity spreading across the stars by VQGAN*

I have a practical question concerning a site feature.

Almost all of the Alignment Forum site features are shared with LessWrong.com; have a look at the LessWrong FAQ for questions concerning the Editor, Voting, Questions, Notifications & Subscriptions, Moderation, and more.

If you can’t easily find the answer there, ping us on Intercom (bottom right of screen) or email us at team@lesswrong.com

What is the AI Alignment Forum?

The Alignment Forum is a single online hub for researchers to discuss all ideas related to ensuring that transformatively powerful AIs are aligned with human values. Discussion ranges from technical models of agency to the strategic landscape, and everything in between.

Top voted posts include What failure looks like, Are we in an AI overhang?, and Embedded Agents. A list of the top posts of all time can be viewed here.

While direct participation in the Forum is limited to deeply established researchers in the field, we have designed it also as a place where up-and-coming researchers can get up to speed on the research paradigms and have pathways to participation too. See How can non-members participate in the Forum? below.

We hope that by being the foremost discussion platform and publication destination for AI Alignment discussion, the Forum will serve as the archive and library of the field. To find posts by sub-topic, view the AI section of the Concepts page.

Why was the Alignment Forum created?

Foremost, because misaligned powerful AIs may pose the greatest risk to our civilization that has ever arisen. The problem is of unknown (or at least unagreed upon) difficulty, and allowing the researchers in the field to better communicate and share their thoughts seems like one of the best things we could do to help the pre-paradigmatic field.

In the past, journals or conferences might have been the best methods for increasing discussion and collaboration, but in the current age we believe that a well-designed online forum with things like immediate publication, distributed rating of quality (i.e. “peer review”), portability/shareability (e.g. via links), etc., provides the most promising way for the field to develop good standards and methodologies.

A further major benefit of having alignment content and discussion in one easily accessible place is that it helps new researchers get onboarded to the field. Hopefully, this will help them begin contributing sooner.

Who is the AI Alignment Forum for?

There exists an interconnected community of Alignment researchers in industry, academia, and elsewhere who have spent many years thinking carefully about a variety of approaches to alignment. Such research receives institutional support from organizations including FHI, CHAI, DeepMind, OpenAI, MIRI, Open Philanthropy, ARC, and others. The Alignment Forum membership currently consists of researchers at these organizations and their respective collaborators.

The Forum is also intended to be a way to interact with and contribute to the cutting edge research for people not connected to these institutions either professionally or socially. There have been many such individuals on LessWrong, and that is the current best place for such people to start contributing, to be given feedback and to skill-up in this domain.

There are about 50-100 members of the Forum who are (1) able to post and comment directly to the Forum without review, (2) able to promote the content of others to the Forum. This group will not grow quickly; however, as of August 2021, we have made it easier for non-members to submit content to the Forum.

What type of content is appropriate?

As a rule-of-thumb, if a thought is something you’d bring up when talking to someone at a research workshop or to a colleague in your lab, it’s also a welcome contribution here.

If you’d like a sense of what other Forum members are interested in, here’s some data from a survey conducted during the open beta of the Forum (n = 34). We polled these early users on what high-level categories of content they were interested in.

The responses were on a 1-5 scale, which represented “If I see 1 post per day, I want to see this type of content…” (1) Once per year, (2) Once per 3-4 months (3) Once per 1-2 months (4) Once per 1-2 weeks (5) A third of all posts that I see.

New theory-oriented alignment research typical of MIRI or CHAI: 4.4 / 5
New ML-oriented alignment research typical of OpenAI or DeepMind's safety teams: 4.2 / 5
New formal or nearly-formal discussion of intellectually interesting topics that look questionably/ambiguously/peripherally alignment-related: 3.5 / 5
High-quality informal discussion of alignment research methodology and background assumptions, what's needed for progress on different agendas, why people are pursuing this or that agenda, etc: 4.1 / 5
Attempts to more clearly package/explain/summarise previously discussed alignment research: 3.7 / 5
New technical ideas that are clearly not alignment-related but are likely to be intellectually interesting to forum regulars: 2.2 / 5
High-quality informal discussion of very core background questions about advanced AI systems: 3.3 / 5
Typical AGI forecasting research/discussion that isn't obviously unusually relevant to AGI alignment work: 2.2 / 5

What is the relationship between the Alignment Forum and LessWrong?

The Alignment Forum was created by and is maintained by the team behind LessWrong (the web forum). The two sites share a codebase and database. They integrate in the following ways:

Automatic Crossposting - Any new post or comment on the new AI Alignment Forum is automatically cross-posted to LessWrong.com. Accounts are also shared between the two platforms (though non-AF member accounts will not be able to post without review).
Content Promotion - Any comment or post on LessWrong can be promoted by members of the AI Alignment Forum to the AI Alignment Forum.
Separate Reputation – The reputation systems (karma) for LessWrong and the AI Alignment Forum are separate. On LessWrong you can see two reputation scores: a primary karma score combining karma from both sites, and a secondary karma score specific to AI Alignment Forum members. On the AI Alignment Forum, you will just see the AI Alignment Forum karma of posts and comments.
Content Ownership - If a comment or post of yours is promoted to the AI Alignment Forum, you will continue to have full ownership of the content, and you’ll be able to respond directly to all comments on your content.

Both LessWrong and the Alignment Forum are foci of Alignment Discussion; however, the Alignment Forum maintains even higher standards of content quality than LessWrong. The goal is to provide a place where researchers with shared technical and conceptual background can collaborate, and where a strong set of norms for facilitating good research collaborations can take hold. For this reason, both submissions and members to the Alignment Forum are heavily vetted.

How do I get started in AI Alignment research?

If you're new to the AI Alignment research field, we recommend four great introductory sequences that cover several different paradigms of thought within the field. Get started reading them and feel free to leave comments with any questions you have.

The introductory sequences are:

Embedded Agency by Scott Garrabrant and Abram Demski of MIRI
Iterated Amplification by Paul Christiano of ARC
Value Learning by Rohin Shah of DeepMind
AGI Safety from First Principles by Richard Ngo, formerly of DeepMind

Following that, you might want to begin writing up some of your thoughts and sharing them on LessWrong to get feedback.

How do I join the Alignment Forum?

As described above, membership to the Alignment Forum is very selective (and not strictly required to participate in discussions on Alignment Forum content, since one can do so on LessWrong).

The best pathway towards becoming a member is to produce lots of great AI Alignment content, and to post it to LessWrong and participate in discussions there. The LessWrong/Alignment Forum admins monitor activity on both sites, and if someone consistently contributes to Alignment discussions on LessWrong that get promoted to the Alignment Forum, then it’s quite possible full membership will be offered.

I work professionally on AI Alignment. Shouldn’t I be a member?

Maybe but not definitely! The bar for membership is higher than working on AI Alignment professionally, even if you are doing really great work. Membership, which allows you to directly post and comment, is likely to be offered only after multiple existing Alignment Forum members are excited to see your work. Until then, a review step is required. You can still submit content to the Alignment Forum but it might take a few days for a decision to be made.

Another reason for the high bar for membership is that any member has the ability to promote content to the Alignment Forum, kind of like a curator. This requires significant trust and membership is restricted to those who have earned this level of trust among the Alignment Forum members.

How can non-members participate in the Forum?

Non-members can participate in the Forum in two ways:

1. Posting and commenting Alignment content to LessWrong

Alignment content posted to LessWrong will be seen by many of the researchers present on the Alignment Forum. If they (or the Forum admins) think that particular content is a good fit for the Forum, it will be promoted to the Forum and become viewable there.

If your posts or comments are promoted to the Alignment Forum, you will be able to directly participate in the discussion of your content on the Forum.

2. Submitting content on the Alignment Forum

Non-members can now submit content directly on the Alignment Forum (and not just via LessWrong).

If you post or comment, your submission will enter a review queue and a decision to accept or reject it from the Alignment Forum will be made within three days. If it is rejected, you will receive a minimum one-sentence explanation.
In the meantime (and regardless of outcome), your post or comment will be published to LessWrong. There it can be immediately viewed and discussed by everyone, and edited by you. This allows you to get quick feedback, and allows site admins to use the reaction there to help make the decision about whether it is a good fit for the Alignment Forum. For example, if several Alignment Forum members are discussing your content on LessWrong, it is likely a good fit for the Forum and will be promoted.

How can I submit something I already wrote?

If you have already written and published a post on LessWrong but would like to submit it for acceptance to the Alignment Forum, please contact us via Intercom (bottom right) or email us at team@lesswrong.com

Who runs the Alignment Forum?

The Alignment Forum is maintained and run by the LessWrong team who also run the LessWrong website. An independent board composed of representatives of major Alignment research orgs (and independent members too) oversees major decisions concerning the Forum.

Can I use LaTex?

Yes! You can use LaTeX in posts and comments with Cmd+4 / Ctrl+4.

Also, if you go into your user settings and switch to the markdown editor, you can just copy-paste LaTeX into a post/comment and it will render when you submit with no further steps required.

I have a different question.

Please don’t hesitate to contact us via Intercom (bottom right of the screen) or email us at team@lesswrong.com. We’d love to answer your questions.

[-]Evan R. Murphy2y130

How do I get started in AI Alignment research?
If you're new to the AI Alignment research field, we recommend four great introductory sequences that cover several different paradigms of thought within the field. Get started reading them and feel free to leave comments with any questions you have.
The introductory sequences are:
Embedded Agency by Scott Garrabrant and Abram Demski of MIRI
Iterated Amplification by Paul Christiano of ARC
Value Learning by Rohin Shah of DeepMind
AGI Safety from First Principles by Richard Ngo, formerly of DeepMind
Following that, you might want to begin writing up some of your thoughts and sharing them on LessWrong to get feedback.

I think it would be great to update this section. For example, it could link to the AGI Safety Fundamentals curriculum which has a wealth of valuable readings not on this list. And there are other courses that it would be good for newcomers to know about as well, such as MLAB.

Why am I suggesting this? This FAQ was the first place I found with clear advice when I was first getting interested in AI alignment in late 2021, and I took it quite seriously/literally. The very first alignment research I tried to read was the illustrated Embedded Agency sequence, because that was at the top of the above list. While I came to later appreciate Embedded Agency, I found this sequence (particularly the illustrated version which features prominently in the link above, as opposed to the text version) to be a confusing introduction to alignment. I also wasn't immediately aware of anything important there was to read outside of the 4 texts linked above, while I now feel like there's a lot!

It's just one data point of user testing on this FAQ, but something to consider.

[-]Thomas Kwa1y20

That section is even more outdated now. There's nothing on interpretability, Paul's work now extends far beyond IDA, etc. In my opinion it should link to some other guide.

[-]Oliver Habryka1y10

Yeah, does sure seem like we should update something here. I am planning to spend more time on AIAF stuff soon, but until then, if someone has a drop-in paragraph, I would probably lightly edit it and then just use whatever you send me/post here.

[-]johnswentworth3y30

I recommend that the title make it clearer that non-members can now submit alignment forum content for review, since this post is cross-posted on LW.

[-]Ruben Bloom3y10

You're right. Maybe worth the extra words for now.

[-]Ruben Bloom1y20

The Alignment Forum is supposed to be a very high signal-to-noise place for Alignment content, where researchers can trust that all content they read will be material they're interested in seeing (even at the expense of some false negatives).