How do I get started in AI Alignment research?
If you're new to the AI Alignment research field, we recommend four great introductory sequences that cover several different paradigms of thought within the field. Get started reading them and feel free to leave comments with any questions you have.
The introductory sequences are:
- Embedded Agency by Scott Garrabrant and Abram Demski of MIRI
- Iterated Amplification by Paul Christiano of ARC
- Value Learning by Rohin Shah of DeepMind
- AGI Safety from First Principles by Richard Ngo, formerly of DeepMind
Following that, you might want to begin writing up some of your thoughts and sharing them on LessWrong to get feedback.
I think it would be great to update this section. For example, it could link to the AGI Safety Fundamentals curriculum which has a wealth of valuable readings not on this list. And there are other courses that it would be good for newcomers to know about as well, such as MLAB.
Why am I suggesting this? This FAQ was the first place I found with clear advice when I was first getting interested in AI alignment in late 2021, and I took it quite seriously/literally. The very first alignment research I tried to read was the illustrated Embedded Agency sequence, because that was at the top of the above list. While I came to later appreciate Embedded Agency, I found this sequence (particularly the illustrated version which features prominently in the link above, as opposed to the text version) to be a confusing introduction to alignment. I also wasn't immediately aware of anything important there was to read outside of the 4 texts linked above, while I now feel like there's a lot!
It's just one data point of user testing on this FAQ, but something to consider.
That section is even more outdated now. There's nothing on interpretability, Paul's work now extends far beyond IDA, etc. In my opinion it should link to some other guide.
Yeah, does sure seem like we should update something here. I am planning to spend more time on AIAF stuff soon, but until then, if someone has a drop-in paragraph, I would probably lightly edit it and then just use whatever you send me/post here.
The Alignment Forum is supposed to be a very high signal-to-noise place for Alignment content, where researchers can trust that all content they read will be material they're interested in seeing (even at the expense of some false negatives).
The AI Alignment Forum was launched in 2018. Since then, several hundred researchers have contributed approximately two thousand posts and nine thousand comments. Nearing the third birthday of the Forum, we are publishing this updated and clarified FAQ.
I have a practical question concerning a site feature.
Almost all of the Alignment Forum site features are shared with LessWrong.com; have a look at the LessWrong FAQ for questions concerning the Editor, Voting, Questions, Notifications & Subscriptions, Moderation, and more.
If you can’t easily find the answer there, ping us on Intercom (bottom right of screen) or email us at team@lesswrong.com
What is the AI Alignment Forum?
The Alignment Forum is a single online hub for researchers to discuss all ideas related to ensuring that transformatively powerful AIs are aligned with human values. Discussion ranges from technical models of agency to the strategic landscape, and everything in between.
Top voted posts include What failure looks like, Are we in an AI overhang?, and Embedded Agents. A list of the top posts of all time can be viewed here.
While direct participation in the Forum is limited to deeply established researchers in the field, we have designed it also as a place where up-and-coming researchers can get up to speed on the research paradigms and have pathways to participation too. See How can non-members participate in the Forum? below.
We hope that by being the foremost discussion platform and publication destination for AI Alignment discussion, the Forum will serve as the archive and library of the field. To find posts by sub-topic, view the AI section of the Concepts page.
Why was the Alignment Forum created?
Foremost, because misaligned powerful AIs may pose the greatest risk to our civilization that has ever arisen. The problem is of unknown (or at least unagreed upon) difficulty, and allowing the researchers in the field to better communicate and share their thoughts seems like one of the best things we could do to help the pre-paradigmatic field.
In the past, journals or conferences might have been the best methods for increasing discussion and collaboration, but in the current age we believe that a well-designed online forum with things like immediate publication, distributed rating of quality (i.e. “peer review”), portability/shareability (e.g. via links), etc., provides the most promising way for the field to develop good standards and methodologies.
A further major benefit of having alignment content and discussion in one easily accessible place is that it helps new researchers get onboarded to the field. Hopefully, this will help them begin contributing sooner.
Who is the AI Alignment Forum for?
There exists an interconnected community of Alignment researchers in industry, academia, and elsewhere who have spent many years thinking carefully about a variety of approaches to alignment. Such research receives institutional support from organizations including FHI, CHAI, DeepMind, OpenAI, MIRI, Open Philanthropy, ARC, and others. The Alignment Forum membership currently consists of researchers at these organizations and their respective collaborators.
The Forum is also intended to be a way to interact with and contribute to the cutting edge research for people not connected to these institutions either professionally or socially. There have been many such individuals on LessWrong, and that is the current best place for such people to start contributing, to be given feedback and to skill-up in this domain.
There are about 50-100 members of the Forum who are (1) able to post and comment directly to the Forum without review, (2) able to promote the content of others to the Forum. This group will not grow quickly; however, as of August 2021, we have made it easier for non-members to submit content to the Forum.
What type of content is appropriate?
As a rule-of-thumb, if a thought is something you’d bring up when talking to someone at a research workshop or to a colleague in your lab, it’s also a welcome contribution here.
If you’d like a sense of what other Forum members are interested in, here’s some data from a survey conducted during the open beta of the Forum (n = 34). We polled these early users on what high-level categories of content they were interested in.
The responses were on a 1-5 scale, which represented “If I see 1 post per day, I want to see this type of content…” (1) Once per year, (2) Once per 3-4 months (3) Once per 1-2 months (4) Once per 1-2 weeks (5) A third of all posts that I see.
What is the relationship between the Alignment Forum and LessWrong?
The Alignment Forum was created by and is maintained by the team behind LessWrong (the web forum). The two sites share a codebase and database. They integrate in the following ways:
Both LessWrong and the Alignment Forum are foci of Alignment Discussion; however, the Alignment Forum maintains even higher standards of content quality than LessWrong. The goal is to provide a place where researchers with shared technical and conceptual background can collaborate, and where a strong set of norms for facilitating good research collaborations can take hold. For this reason, both submissions and members to the Alignment Forum are heavily vetted.
How do I get started in AI Alignment research?
If you're new to the AI Alignment research field, we recommend four great introductory sequences that cover several different paradigms of thought within the field. Get started reading them and feel free to leave comments with any questions you have.
The introductory sequences are:
Following that, you might want to begin writing up some of your thoughts and sharing them on LessWrong to get feedback.
How do I join the Alignment Forum?
As described above, membership to the Alignment Forum is very selective (and not strictly required to participate in discussions on Alignment Forum content, since one can do so on LessWrong).
The best pathway towards becoming a member is to produce lots of great AI Alignment content, and to post it to LessWrong and participate in discussions there. The LessWrong/Alignment Forum admins monitor activity on both sites, and if someone consistently contributes to Alignment discussions on LessWrong that get promoted to the Alignment Forum, then it’s quite possible full membership will be offered.
I work professionally on AI Alignment. Shouldn’t I be a member?
Maybe but not definitely! The bar for membership is higher than working on AI Alignment professionally, even if you are doing really great work. Membership, which allows you to directly post and comment, is likely to be offered only after multiple existing Alignment Forum members are excited to see your work. Until then, a review step is required. You can still submit content to the Alignment Forum but it might take a few days for a decision to be made.
Another reason for the high bar for membership is that any member has the ability to promote content to the Alignment Forum, kind of like a curator. This requires significant trust and membership is restricted to those who have earned this level of trust among the Alignment Forum members.
How can non-members participate in the Forum?
Non-members can participate in the Forum in two ways:
1. Posting and commenting Alignment content to LessWrong
Alignment content posted to LessWrong will be seen by many of the researchers present on the Alignment Forum. If they (or the Forum admins) think that particular content is a good fit for the Forum, it will be promoted to the Forum and become viewable there.
If your posts or comments are promoted to the Alignment Forum, you will be able to directly participate in the discussion of your content on the Forum.
2. Submitting content on the Alignment Forum
Non-members can now submit content directly on the Alignment Forum (and not just via LessWrong).
How can I submit something I already wrote?
If you have already written and published a post on LessWrong but would like to submit it for acceptance to the Alignment Forum, please contact us via Intercom (bottom right) or email us at team@lesswrong.com
Who runs the Alignment Forum?
The Alignment Forum is maintained and run by the LessWrong team who also run the LessWrong website. An independent board composed of representatives of major Alignment research orgs (and independent members too) oversees major decisions concerning the Forum.
Can I use LaTex?
Yes! You can use LaTeX in posts and comments with Cmd+4 / Ctrl+4.
Also, if you go into your user settings and switch to the markdown editor, you can just copy-paste LaTeX into a post/comment and it will render when you submit with no further steps required.
I have a different question.
Please don’t hesitate to contact us via Intercom (bottom right of the screen) or email us at team@lesswrong.com. We’d love to answer your questions.