In the past six weeks, the Stanford Existential Risks Initiative (SERI) has been running a trial for the “ML Alignment Theory Scholars” (MATS) program. Our goal is to increase the number of people working on alignment theory, and to do this, we’re running a scholars program that provides mentorship, funding, and community to promising new alignment theorists. This program is run in partnership with Evan Hubinger, who has been providing all of the mentorship to each of the scholars for their trial.

As the final phase of the trial, each participant has taken a previous research artifact (usually an Alignment Forum post) and written a distillation and expansion of that post. The posts were picked by Evan and each participant signed up for one they were interested in. Within the next two weeks (12/7 - 12/17), we’ll be posting all of these posts to lesswrong and the alignment forum as part of a sequence, with a couple of posts going up each day. (There will be around 10-15 posts total.)

Community Engagement

Evan will be evaluating each post to determine whether participants make it to the next stage of the seminar program (where they have the opportunity to do novel research with a mentor), but we’d also be interested in hearing community feedback on each post. This could be just through upvotes or alternatively, via comments as well. We’ll run a conclusion post with our takeaways a week or two after the final blogpost has been released. If it’s interesting, we’d also be happy to post the story of MATS’ creation for other prospective community builders.

Additionally, if Evan knows you and you would be interested in mentoring one of the participants for the next stage of the program—e.g. you really liked their post and think it would be productive to work with them—you should reach out to Evan.

Program Description

From here on out, we’ll be discussing our program and future strategy. The previous two paragraphs are all the context needed to understand the series of blog posts.

The SERI MATS program aims to gather promising alignment theorists and give them an opportunity to learn from an experienced mentor.

The application process was as follows. We first asked the mentors of the Cambridge AGI safety fundamentals for promising participants and interviewed them. After the interviews, we put the scholars through a paid, six-week long trial period for 10 hours a week. During these times, scholars read through Evan’s reading list and distilled and expanded one of the research artifacts that Evan chose. Evan will be evaluating each final blogpost to determine who gets into the scholars program.

The scholars program will initially be a two-month program in which scholars pursue one of the problems in Evan’s list of subproblems. Scholars have the option of forming teams. If they perform good research (as evaluated by Evan), we hope to be able to repeat the program beyond the initial two-month program. 

Future Steps

In the immediate future, we will be looking to continue to refine the program and also expand to more mentors beyond Evan. Additionally, we’ll likely be running another round of this program this spring or summer, though we are not yet looking for applications. 

Our long-term goal would be to create a self-sustaining community of alignment theorist mentees and mentors which helps build up new theorists and connect them to mentors, specific subproblems, and (hopefully) research organizations. In other words, we’re trying to facilitate the transition between having finished a reading group like AGI safety fundamentals and working on the problems as a part-time or full-time job.

Acknowledgements

We’re very grateful to be supported by a grant from Open Philanthropy. Additionally, we'd like to thank Sawyer from BERI in helping with the logistics related to the financials.


 

34

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:23 PM

If it’s interesting, we’d also be happy to post the story of MATS’ creation for other prospective community builders.

I personally would be very interested in this, especially with a mind to focusing on prosaic systems alignment today (as opposed to alignment theory).

How do you define alignment theory?