Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

I think this is really exciting and I’m very interested see how it goes. I think the current set of problems and methodologies is solid enough that participants have a reasonable shot at making meaningful progress within a month. I also expect this to be a useful way to learn about language models and to generally be in a better position to think about alignment.

I think we’re still a long way from understanding model behavior well enough that we could e.g. rule out deceptive alignment, but it feels to me like recent work on LM interpretability is making real progress towards that goal, and I can imagine having large teams studying frontier models closely enough to robustly notice deceptive alignment well in advance by the time we have transformative AI.

Reply

[-]Neel Nanda3y56

I'm really excited about this program! Super curious to see what comes out of it - I expect I'll learn a lot whether it goes well, or struggles to get traction. And I want to see more of this kind of ambitious scalable alignment effort!

If you're interested in getting into mechanistic interpretability work, you should definitely apply to it

Reply

[-]Ben Pace3y21

I was trying to figure out whether someone who is just here for the month of November should apply. I think the answer is no, but I am broadly a bit confused when this is a commitment for.

Also, are people going through as cohorts or will they start with the training week whenever they show up, not necessarily in-sync with anyone else?

Also, is the idea to be doing self-directed research by default, or research in collaboration with Redwood staff by default? I don't know what my default action is day-to-day during this program. Do I have to come in with a bunch of research plans already?

Reply

[-]Buck3y*30

Thanks for the questions :)

I was trying to figure out whether someone who is just here for the month of November should apply. I think the answer is no,

Probably no.

but I am broadly a bit confused when this is a commitment for.

Yeah we haven't totally settled this yet; the application form asks a lot of questions about availability. I think the simplest more specific answer is "you probably have to be available in January, and it would be cool if you were available earlier and wanted to get here earlier and do this for longer".

Also, are people going through as cohorts or will they start with the training week whenever they show up, not necessarily in-sync with anyone else?

Not totally settled. We'll probably have most people at a big final cohort in January, and we'll try to have people who arrive earlier show up at synced times so that they can do the training week with others.

Also, is the idea to be doing self-directed research by default, or research in collaboration with Redwood staff by default? I don't know what my default action is day-to-day during this program. Do I have to come in with a bunch of research plans already?

The default is to do research directed by Redwood staff. You do not need to come in with any research plans.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

59

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

59

Goals

Why you should apply:

Why we’re doing this:

Why do this now?

Who should apply?

What is doing this sort of research like?

Schedule

Miscellaneous Notes

FAQ

What if I can’t make these dates?

Am I eligible if I’m not sure I want to do interpretability research long-term?

What’s the application process?

Can you sponsor visas?

Is this research promising enough to justify running this program?

How useful is this kind of interpretability research for understanding models that might pose an existential risk?