Sequences

Leveling Up: advice & resources for junior alignment researchers

Wiki Contributions

Comments

Congratulations on launching!

On the governance side, one question I'd be excited to see Apollo (and ARC evals & any other similar groups) think/write about is: what happens after a dangerous capability eval goes off? 

Of course, the actual answer will be shaped by the particular climate/culture/zeitgeist/policy window/lab factors that are impossible to fully predict in advance.

But my impression is that this question is relatively neglected, and I wouldn't be surprised if sharp newcomers were able to meaningfully improve the community's thinking on this. 

Thank you for writing this post, Adam! Looking forward to seeing what you and your epistemology team produce in the months ahead.

SERI MATS is doing a great job of scaling conceptual alignment research, and seem open to integrate some of the ideas behind Refine

I'm a big fan of SERI MATS. But my impression was that SERI MATS had a rather different pedagogy/structure (compared to Refine). In particular: 

  1. SERI MATS has an "apprenticeship model" (every mentee is matched with one mentor), whereas Refine mentees didn't have mentors.
  2. Refine was optimizing for people who could come up with "their own radically different agendas", whereas SERI MATS doesn't emphasize this. (Some mentors may encourage some of their mentees to think about new agendas, but my impression is that this varies a lot mentor-to-mentor, and it's not as baked into the overall culture). 
  3. SERI MATS has (thus far) nearly-exclusively recruited from the EA/rationality/AIS communities. Seems like Refine also did this, though I imagine that "Refine 2.0" would be more willing to recruit from outside these communities. (I'm not sure what SERI-MATS's stance is, but my impression is that their selection criteria heavily favors people who have existing work in AIS or existing connections. This of course makes sense, because past contributions/endorsements are a useful signal, unless the program is explicitly designed to go for oddball/weird/new/uncorrelated ideas). 

Two questions for you: 

  1. Are there any particular lessons/ideas from Refine that you expect (or hope) SERI MATS to incorporate?
  2. Do you think there's now a hole in the space that someone should consider filling (by making Refine 2.0), or do you expect that much of the value of Refine will be covered by SERI MATS [and other programs]?

Hastily written; may edit later

Thanks for mentioning this, Jan! We'd be happy to hear suggestions for additional judges. Feel free to email us at akash@alignmentawards.com and olivia@alignmentawards.com.  

Some additional thoughts:

  1. We chose judges primarily based on their expertise and (our perception of) their ability to evaluate submissions about goal misgeneralization and corrigibility. Lauro, Richard, Nate, and John ade some of few researchers who have thought substantially about these problems. In particular, Lauro first-authored the first paper about goal misgeneralization and Nate first-authored a foundational paper about corrigibility.
  2. We think the judges do have some reasonable credentials (e.g., Richard works at OpenAI, Lauro is a PhD student at the University of Cambridge, Nate Soares is the Executive Director of a research organization & he has an h-index of 12, as well as 500+ citations). I think the contest meets the bar of "having reasonably well-credentialed judges" but doesn't meet the bar of "having extremely well-credentialed judges (e.g., well-established professors with thousands of citations). I think that's fine.
  3. We got feedback from several ML people before launching. We didn't get feedback that this looks "extremely weird" (though I'll note that research competitions in general are pretty unusual). 
  4. I think it's plausible that some people will find this extremely weird (especially people who judge things primarily based on the cumulative prestige of the associated parties & don't think that OpenAI/500 citations/Cambridge are enough), but I don't expect this to be a common reaction.

Some clarifications + quick thoughts on Sam’s points:

  1. The contest isn’t aimed primarily/exclusively at established ML researchers (though we are excited to receive submissions from any ML researchers who wish to participate). 
  2. We didn’t optimize our contest to attract established researchers. Our contests are optimized to take questions that we think are at the core of alignment research and present them in a (somewhat less vague) format that gets more people to think about them.
  3. We’re excited that other groups are running contests that are designed to attract established researchers & present different research questions. 
  4. All else equal, we think that precise/quantifiable grading criteria & a diverse panel of reviewers are preferable. However, in our view, many of the core problems in alignment (including goal misgeneralization and corrigibility) have not been sufficiently well-operationalized to have precise/quantifiable grading criteria at this stage.