What I’ll be doing at MIRI

evhub

Note: This is a personal post describing my own plans, not a post with actual research content.

Having finished my internship working with Paul Christiano and others at OpenAI, I’ll be moving to doing research at MIRI. I’ve decided to do research at MIRI because I believe MIRI will be the easiest, most convenient place for me to continue doing research in the near future. That being said, there are a couple of particular aspects of what I’ll be doing at MIRI that I think are worth being explicit about.

First, and most importantly, this decision does not represent any substantive change in my beliefs regarding AI safety. In particular, my research continues to be focused around solving inner alignment for amplification. My post on relaxed adversarial training continues to represent a fairly up-to-date form of what I think needs to be done along these lines.

Second, my research will remain public by default. I have discussed with MIRI their decision to make their research non-disclosed-by-default and we agreed that my research agenda is a reasonable exception. I strongly believe in the importance of collaborating with both the AI safety and machine learning communities and thus believe in the need for sharing research. Of course, I also fully believe in the importance of carefully reviewing possible harmful effects from publishing before disclosing results—and will continue to do so with all of my research—though I will attempt to publish anything I don’t believe to pose a meaningful risk.

Third—and this should go without saying—I fully anticipate continuing to collaborate with other researchers at other institutions such as OpenAI, Ought, CHAI, DeepMind, FHI, etc. The task of making AGI safe is a huge endeavor that I fully believe will require the joint work of an entire field. If you are interested in working with me on anything (regarding inner alignment or anything else) please don’t hesitate to send me an email at evanjhub@gmail.com.

I have discussed with MIRI their decision to make their research non-disclosed-by-default and we agreed that my research agenda is a reasonable exception.

Small note: my view of MIRI's nondisclosed-by-default policy is that if all researchers involved with a research program think it should obviously be public then it should obviously be public, and that doesn't require a bunch of bureaucracy. I think this while simultaneously predicting that when researchers have a part of themselves that feels uncertain or uneasy about whether their research should be public, they will find that there are large benefits to instituting a nondisclosed-by-default policy. But the policy is there to enable researchers, not to annoy them and make them jump through hoops.

(Caveat: within ML, it's still rare for risk-based nondisclosure to be treated as a real option, and many social incentives favor publishing-by-default. I want to be very clear that within the context of those incentives, I expect many people to jump to "this seems obviously safe to me" when the evidence doesn't warrant it. I think it's important to facilitate an environment where it's not just OK-on-paper but also socially-hedonic to decide against publishing, and I think that these decisions often warrant serious thought. The aim of MIRI's disclosure policy is to remove undue pressures to make publication decisions prematurely, not to override researchers' considered conclusions.)

That you're working full time on research, have a stable salary, and are in a geographical location conducive to talking with a lot of other thoughtful people who think a lot about these topics, are all very valuable things, and I'm pleased to hear these things are happening for you :-)

On the subject of privacy, I was recently reading a friend's career plan, who was looking for jobs in AI alignment, and I wrote this:

Do not accept secrets lightly. If you accept one wrong secret, you will go the way of MIRI or Leverage or US government officials with a security clearance, where it’s just too much effort to communicate your thoughts with outsiders, for fear of accidentally letting out secrets. And you’re not allowed to tell people the true reasons for your beliefs. It’s not that nobody should work at MIRI or Leverage or get confidential info from the government. It’s that it raises the costs of participating in public discourse to be not worth it for almost everyone involved.

It's really great to hear that you'll continue writing publicly, as I think the stuff you're doing is important and exciting and there are strong distributed benefits for the broader landscape of people working on AI alignment or who want to.

Also feel free to come downstairs and hang out with us in the LessWrong offices :-)

Congratulations! :)

Do come visit our office in your basement sometimes.