How To Get Into Independent Research On Alignment/Agency

by johnswentworth18 min read19th Nov 20215 comments

81

CareersAI
Curated

I’m an independent researcher working on AI alignment and the theory of agency. I’m 29 years old, will make about $90k this year, and set my own research agenda. I deal with basically zero academic bullshit - my grant applications each take about one day’s attention to write (and decisions typically come back in ~1 month), and I publish the bulk of my work right here on LessWrong/AF. Best of all, I work on some really cool technical problems which I expect are central to the future of humanity.

If your reaction to that is “Where can I sign up?”, then this post is for you.

Background Models

Independence

First things first: the “independent” part of “independent research” means self-employment, and everything that goes with it. It means the onus is on you to figure out what to do, how to provide value, what to prioritize, and what to aim for. In practice, it also usually means “independent” in a broader sense: you won’t have a standard template or agenda to follow. If you go down this path, assume that you will need to chart your own course - in particular, your own research agenda.

For the sort of person this post is aimed at, that will be a very big upside, not a downside.

Disclaimer: there are ways to get into alignment research which don’t involve quite so much figuring-it-all-out-on-your-own. Some people receive mentorship from existing researchers. Some people go work for alignment research organizations. Either of those paths can involve “independent research” in the sense that you are technically self-employed, but those paths aren’t “independent” in the broader sense of the word, and they’re not the main topic of this post.

Preparadigmicity

As a field, the study of alignment and agency is especially well-suited to independent research, because they center around problems we don’t understand. It’s not just that we don’t have the answers; we don’t even have the right frames for thinking about the problems. Agency is an area where we are fundamentally confused. AI alignment is largely a problem which hasn’t happened yet, on technology which hasn’t been invented yet, which we nonetheless want to solve in advance. Figuring out the right frames - the right paradigm - is itself a central part of the job.

The field needs people who are going to come up with new frames/approaches/models/paradigms/etc, because we’re pretty sure the current frames/approaches/models/paradigms/etc aren’t enough. Thus the great fit for independent research: as an independent researcher, you’re not beholden to some existing agenda based on existing frames. Coming up with your own idea of what the key problems are, how to frame them, what tools to apply… that sort of thing is exactly what we need, and it requires people who aren’t committed to the strategies of existing senior researchers and organizations. It requires people who have an independent high-level understanding of the field and different angles of looking at, and can pick out the key problems and paths from that perspective.

Again, for the sort of person this post is aimed at, that will be a very big upside.

… but it comes with some trade-offs. As a historical example of preparadigmatic research, here’s Kuhn talking about optics before Newton:

Being able to take no common body of belief for granted, each writer on physical optics felt forced to build his field anew from its foundations. In doing so, his choice of supporting experiment and observation was relatively free, for there was no standard set of methods or of phenomena that every optical writer felt forced to employ and explain. Under these circumstances, the dialogue of the resulting books was often directed as much to the members of other schools as it was to nature.

This very much applies to alignment research. Because the field does not already have a set of shared frames - i.e. a paradigm - you will need to spend a lot of effort explaining your frames, tools, agenda, and strategy. For the field, such discussion is a necessary step to spreading ideas and eventually creating a paradigm. For you, it’s a necessary step to get paid, and to get useful engagement with your work from others.

In particular, you will probably need to both think and write a lot about your strategy: the models and intuitions which inform why you’re working on the particular problems you’ve chosen, why the tools you’re using seem promising, what kinds of results you expect, and what your long-term vision looks like. Inevitably, a lot of this will rely on informal arguments or intuitions; you will need to figure out how to trace the sources of those intuitions and explain them to other people, without having to formalize everything. Explain the actual process which led to an idea/decision/approach, without going down the bottomless rabbit hole of deeply researching every single claim.

The current version of LessWrong was built in large part to support exactly that sort of discussion, and I strongly recommend using it.

Getting Paid

Right now, the best grantmaker in this space is the Long-Term Future Fund (LTFF). There are other options, but none are quite as good a fit for the sort of work we’re talking about here.

I’ve received a few LTFF grants myself and know some of the people involved in the grantmaking decisions, so I’ll give some thoughts on the most important things you’ll need in order to get paid. Bear in mind that this is inherently speculative and not endorsed by anyone at LTFF. I’d also recommend looking at LTFF’s past grants to get a more direct idea of what kinds of things they fund.

Don’t Bullshit

A low-bullshit grantmaking process works both ways. The LTFF wants to do object-level useful things, not just Look Prestigious, so they keep the application simple and the turnaround time relatively fast. The flip side is that I expect them to look very unkindly on bullshit - i.e. attempts to make the applicant/application Sound Prestigious without actually doing object-level useful things.

In academia, it’s common practice to make up some bullshit about how your research is going to help the world. During my undergrad, this sort of bullshit was explicitly taught. Of course, it’s not like anyone is ever going to hire an economist or statistician (let alone consult a prediction market) to figure out whether the research is actually likely to impact the world in the manner claimed. The goal is just to make the proposal sound good. If you’re coming from academia, this sort of bullshit may be an ingrained habit which takes effort to break.

If you want to make it in alignment/agency research, you’re going to need an actual object-level strategy.

We’ll talk more in the next sections about how to come up with a strategy, but the first stop is The Bottom Line: once you’ve chosen a strategy, anything you say to justify it will not make it any more correct. All that matters is the process which originally made you choose that strategy, or made you stick to it at times when you might realistically have changed course. So first things first, forget whatever clever idea you already have cached, and let’s start from a blank slate.

Reading

Preparadigmicity means you’ll need to spend a lot of time explaining your choice of vision, strategy, models, tools, etc. The flip side of that coin is reading: you’ll probably need to read quite a bit of material from others in the field. This is often nontechnical or semi-technical background material, explanations of intuitions, vague gesturing at broad ideas, etc - you can see plenty of it here on LessWrong and the Alignment Forum. The more of this you read, the better you’ll understand other researchers’ frames (or at least know which frames you don’t understand), and the better you’ll be able to explain your own material in terms others can readily understand.

Early on, there are two main motivators for reading:

  • To understand which strategies have already been tried, and failed, to avoid retreading that ground
  • To understand a bit of the existing jargon (definitely not all of it!), in order to explain your own ideas in terms already familiar to others

To understand (some) existing approaches and jargon, I’d recommend at least skimming these sequences/posts, and diving deeper into whichever most resemble the directions you want to pursue:

To understand barriers (other than what’s discussed in the above links), this talk and the Rocket Alignment Problem are probably the best starting points. Note that lots of people disagree with those last two links (as well as 11 Proposals), but you probably want to be at least familiar enough to have an informed disagreement.

Note that this is all on LessWrong, which means you can leave comments with questions, attempts to summarize, disagreements, etc. Often people will reply. This helps a lot for actually absorbing the ideas. (h/t Adam Shimi for pointing this out.)

I invite others to leave suggested reading in the comments. (This does risk turning into a big debate over whether X or Y is actually a good idea for new people, but at least then we’ll have a realistic demonstration of how much everybody disagrees over all this. I did warn you that the field is preparadigmatic!)

Finally, there’s The Sequences. They are long, but if you haven’t read them, then you definitely risk various failure modes which will be obvious to people who have read them and very confusing to you. I wouldn’t quite say they’re required reading, especially if you’re on the more technical end of the spectrum and already somewhat familiar with alignment discussions, but there are definitely many people who will be somewhat surprised if you do technical alignment/agency research and haven’t read them.

Again, I want to emphasize that everyone disagrees on all this stuff. Roughly speaking, assume that the grantmakers care more about your research having some plausible path to usefulness than about agreeing with any particular position in any of the field’s ongoing arguments.

The Hamming Question

Over on the other side of the dining hall was a chemistry table. I had worked with one of the fellows, Dave McCall; furthermore he was courting our secretary at the time. I went over and said, "Do you mind if I join you?" They can't say no, so I started eating with them for a while. And I started asking, "What are the important problems of your field?" And after a week or so, "What important problems are you working on?" And after some more time I came in one day and said, "If what you are doing is not important, and if you don't think it is going to lead to something important, why are you at Bell Labs working on it?" I wasn't welcomed after that; I had to find somebody else to eat with!

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks. The standard (and strongly recommended) exercise to alleviate that problem is to start from the Hamming Questions:

  • What are the most important problems in your field (i.e. alignment/agency)?
  • How are you going to solve them?

At this point, somebody usually complains that minor contributions are important or some such. I’m not going to argue with that, because I expect the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

If you have decent answers to the Hamming Questions, and you make those answers clear to other people, that is probably a sufficient condition for your grant application to not end up in the giant pile of applications from people who don’t even have a model of how their proposal will help. It’s not quite a sufficient condition to get paid, but I would guess that a large majority of people who can clearly answer the Hamming Questions do get paid.

I want to emphasize that I think clear answers to the Hamming Questions are an approximately-sufficient condition, not an approximately-necessary condition; there are definitely other paths. Steve’s story in the comments below is a good example; in his words:

If you're a kinda imposter-syndrome-y person who just constitutionally wouldn't dream of looking themselves in the mirror and saying "I am aiming for a major contribution!", well me too, and don't let John scare you off. :-P

Use Your Pareto Frontier

A great line from Adam Shimi:

Most people who try to go in a direction 'no one else has tried' end up going in the most obvious direction which everyone else has tried.

My main advice to avoid this failure mode is to leverage your Pareto frontier. Apply whatever knowledge, or combination of knowledge, you have which others in the field don’t. Personally, I’ve gained a lot of insight into agency by drawing on systems biology, economics, statistical mechanics, and chaos theory. Others draw heavily on abstract math, like category theory or model theory. Evolutionary biology and user interface design are both rich sources.

This is one reason why it helps to have a broad technical background: the more frames and tools you have to draw on, the more likely you’ll find a novel and promising combination to apply to the most important problems in the field. (Or, just as good: the more frames and tools you have to draw on, the more likely you’ll notice that one of the most important problems has been overlooked.)

Flip side of this: if you have a novel-seeming idea which involves the same kinds of frames and tools which most people in alignment have (i.e. programming expertise, some ML experience, reading Astral Codex Ten) then do write it up, but don’t be surprised if it’s already been done.

If you read through some existing alignment work, and the strategy seems obviously wrong to you in a way which would not be obvious to the median LessWrong user, then that’s a very promising sign.

Legibility

Part of getting a grant is not just having a good plan and the skills to execute it, but to make your plan and skills legible to the people reviewing the grant.

Here’s (my summary of) a rough model from Oli, who’s one of the fund managers for LTFF. In order to get a grant for alignment research, usually someone needs to do one of these three:

  1. Write a grant application which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (This is rare/difficult.)
  2. Have a reference from someone the fund managers know and trust (i.e. the existing alignment research community).
  3. Have some visible online material which clearly signals that they understand the alignment problem and have a non-bullshitted research strategy. (LessWrong posts/comments are a central example.)

As a new entrant to the field, I expect that option #3 is probably your main path. Write up not just your research strategy, but the intuitions, models and arguments behind that strategy. Give examples. Explain what you consider the key problems, why those problems seem central, and the frames and generators behind that reasoning. Again, give examples. Explain conjectures or tools you think are relevant, ideally with examples. If you’re on the theory side, sketch potential empirical tests; if on the empirical side, sketch the conceptual theory behind the ideas. And include examples. Explain your vision of success, and expected applications of your research (if it succeeds). At all stages, focus on giving accessible, intuitive explanations and lots of examples; even people who have lots of technical background will often skip over sections with just dense math, and not everyone has the same technical background as you. And put the examples at the beginnings of the posts, before the abstract/general explanations.

Remember: this is preparadigmatic work. Writing up the ideas, and the generators of the ideas, and the frames, and the tools, and making it all clear and accessible to people with totally different frames and tools, is a central part of the job.

All this writing will also make option #1 and #2 easier over time: writing a lot of posts and comments will eventually generate social connections (though this takes quite a bit of time, especially if you’re not in the Bay Area), and discussion/feedback will give some idea of how to explain things in a way which signals the kinds-of-things LTFF looks for.

(On the topic of feedback: a lot of more experienced researchers ignore most posts which they don't find very promising, partly because it’s a lot of work to explain/argue about problems and partly because there are too many posts to read it all anyway. If you explicitly reach out - e.g. send a message on LessWrong - and ask for feedback, people are much more likely to tell you what they think.)

By the time all that is written up and posted, the grant application itself is a drop in the bucket; that’s a big part of why it only takes a day to write up. A quote from Oli regarding the actual application:

I really wish people would just pretend they’re writing me an email explaining what they plan to do, rather than something aimed at the general public.

This is part of why option #1 is rare - people try to write the LTFF application like it’s an academic grant application or something, and it really isn’t. But also, clear communication is just pretty hard in general, even when you do understand the problem and have a non-bullshitted strategy.

When To Start

This post was mostly written for people who already have the technical skills they need. That probably means grad-level education, though a PhD is definitely not a formal requirement. I know at least a few who think less-than-a-full-undergrad can suffice. Personally, I never went to grad school (though admittedly my undergrad coursework looks an awful lot like a PhD program; I got an unusually large amount of mileage out of it).

In terms of specific skills, I recently wrote a study guide with a bunch of technical topics I’ve found useful, but the more important point is that we don’t currently know what the right combination of background knowledge is. If you already have a broad technical background, then my advice is to take a stab at the problem and see how it goes.

If you are currently in high school or undergrad, the study guide has some recommendations for what to study (and why). The larger your knowledge base, the more tools and frames you’ll have to draw on later. You could also apply for a grant to e.g. pursue some alignment/agency research project over the summer; taking a stab at it will give you some firsthand data on what kinds of tools/frames are useful.

Runway

The grant application takes maybe a day, but there will probably be some groundwork before you’re ready for that. You’ll probably want to read a bunch, figure out a strategy, put up a few posts on it, and maybe update in response to feedback.

Personally, I quit my job as a data scientist in late 2018, and tried out a few different things over the course of the next year before settling into alignment/agency research. I got my first grant in late 2019. If someone with roughly my 2018 level of background knew up front that they wanted to enter the field, I think it would take a lot less time than that; a few months would be my guess. That said, my level of background in 2018 was already well above zero.

I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback. That said, you should probably plan on going full time at latest by the time you get a grant, and possibly sooner. If you’re in academia, then you’ll probably have more room to aim the bulk of your research at alignment without striking out on your own. (Though you should still totally strike out on your own and enjoy the no-academic-bullshit lifestyle.)

Meta

Historically, EA causes (including alignment) have largely drawn from very young populations (mostly undergrads). I believe this is mostly because (a) those are the people who don’t need to be drawn away from a different path which they’re already on, (b) they’re willing to work for peanuts, and (c) they don’t have to unlearn how to bullshit. Unfortunately, a lot of alignment research benefits from a broad technical background, which takes time to build up. So I think we’ve historically had fewer researchers with that sort of broad knowledge than would be ideal, just because we tend to recruit young people.

But conditions have changed in recent years, and I think there’s now room for a different kind of recruitment, aimed at (somewhat) older people with more knowledge and experience.

First: the Sequences are about ten years old, so right about now there are probably a bunch of postgrads and adjunct professors with lots of technical skills who have already read them, have decent epistemic habits (i.e. know how to not bullshit), and have a rough understanding of what the alignment problem is.

Second: nowadays, we have money. If you’re a postgrad or adjunct professor or whatever, and you can do good technical alignment research, you can probably make more money as an independent researcher in alignment than you do now. Our main grantmaker has an application form which takes maybe a few hours at most, usually comes back with a decision in under a month, and complains that it doesn’t have enough good projects to spend its money on.

So if you’re the sort of person who:

  • Wants to tackle big open research problems
  • … in a field where everyone is confused and we don’t have a paradigm yet and you have to basically chart your own course
  • … and the stakes are literally astronomical
  • … and you have a bunch of technical skills, maybe read the sequences ten years ago, and have a basic understanding of what AI alignment is and why it’s hard

… then now is a good time to sit down with a notebook and think about how you’d go about understanding alignment/agency. If you have any promising ideas, write them up, post them here on LessWrong, and apply for a grant to pursue this research full-time.

I can attest that it’s an awesome job.

81

6 comments, sorted by Highlighting new comments since Today at 1:05 AM
New Comment

the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

For me, there were two separate decisions. (1) Around March 2019, having just finished my previous intense long-term internet hobby, I figured my next intense long-term internet hobby was gonna be AI alignment; (2) later on, around June 2020, I started trying to get funding for full-time independent work. (I couldn't work at an org because I didn't want to move to a different city.)

I want to emphasize that at the earlier decision-point, I was absolutely "aiming for minor contributions". I didn't have great qualifications, or familiarity with the field, or a lot of time. But I figured that I could eventually get to a point where I could write helpful comments on other people's blog posts. And that would be my contribution!

Well, I also figured I should be capable of pedagogy and outreach. And that was basically the first thing I did—I wrote a little talk summarizing the field for newbies, and gave it to one audience, and tried and failed to give it to a second audience.

(I find it a lot easier to "study topic X, in order to do Y with that knowledge", compared to "study topic X" full stop. Just starting out on my new hobby, I had no Y yet, so "giving a pedagogical talk" was an obvious-to-me choice of Y.)

Then I had some original ideas! And blogged about them. But they turned out to be bad.

Then I had different original ideas! And blogged about them in my free time for like a year before I applied for LTFF.

…and they rejected me. On the plus side, their rejection came with advice about exactly what I was missing if I wanted to reapply. On the minus side, the advice was pretty hard to follow, given my time constraints. So I started gradually chipping away at the path towards getting those things done. But luckily I wound up getting a different grant a few months later (yay).

With that background, a few comments on the post:

I wrote a fair bit on LessWrong, and researched some agency problems, even before quitting my job. I do expect it helps to “ease into it” this way, and if you’re coming in fresh you should probably give yourself extra time to start writing up ideas, following the field, and getting feedback.

I also went down the "ease into it" path. It's especially (though not exclusively) suitable for people like me who are OK with long-term intense internet hobbies. (AI alignment was my 4th long-term intense internet hobby in my lifetime. Probably last. They are frankly pretty exhausting, especially with a full-time job and kids.)

Probably the most common mistake people make when first attempting to enter the alignment/agency research field is to not have any model at all of the main bottlenecks to alignment, or how their work will address those bottlenecks.

Just to clarify:

This quote makes sense to me if you read "when first attempting to enter the field" as meaning "when first attempting to enter the field as a grant-funded full-time independent researcher".

On the other hand, when you're first attempting to learn about and maybe dabble in the field, well obviously you won't have a good model of the field yet.

One more thing:

the sort of person who this post is already aimed at (i.e. people who are excited to forge their own path in a technical field where everyone is fundamentally confused) is probably not the sort of person who is aiming for minor contributions anyway.

If you're a kinda imposter-syndrome-y person who just constitutionally wouldn't dream of looking themselves in the mirror and saying "I am aiming for a major contribution!", well me too, and don't let John scare you off. :-P

I can attest that it’s an awesome job.

I agree!

I love this post. Thanks, John.

Curated. This post matched my own models of how folk tend to get into independent alignment research, and I've seen some people whose models I trust more endorse the post as well. Scaling good independent alignment research seems very important.

I do like that the post also specifies who shouldn't be going to independent research.

Hi John, thanks a lot.

Your posts are coming at the perfect time. I just gave my notice at my current job, I have about 3 years of runway ahead of me in which I can do whatever I want. I should definitely at least evaluate AI Safety research. My background is a bachelor's in AI (that's a thing in the Netherlands). The little bits of research I did try got good feedback.

Even though I'm in a great position to try this, it still feels like a huge gamble. I'm aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I'm cut out for this?

Not just asking to reduce financial risk, but also because I feel like my learning trajectory would be quite different if I already knew that it was going to work out in the long run. I'd be able to study the fundamentals a lot more before trying research.

I'm aware that a lot of AI Safety research is already of questionable quality. So my question is: how can I determine as quickly as possible whether I'm cut out for this?

My key comment here is that, to be an independent researcher, you will have to rely day-by-day on your own judgement on what has quality and what is valuable. So do you think you have such judgement and could develop it further?

To find out, I suggest you skim a bunch of alignment research agendas, or research overviews like this one, and then read some abstracts/first pages of papers mentioned in there, while trying apply your personal, somewhat intuitive judgement to decide

  • which agenda item/approach looks most promising to you as an actual method for improving alignment

  • which agenda item/approach you feel you could contribute most to, based on your own skills.

If your personal intuitive judgement tells you nothing about the above questions, if it all looks the same to you, then you are probably not cut out to be an independent alignment researcher.

As nobody else has mentioned it yet in this comment section: AI Safety Support is a resource-hub specifically set up to help people get into alignment research field.

I am a 50 year old independent alignment researcher. I guess I need to mention for the record that I never read the sequences, and do not plan to. The piece of Yudkowsky writing that I'd recommend everybody interested in alignment should read is Corrigibilty. But in general: read broadly, and also beyond this forum.

I agree with John's observation that some parts of alignment research are especially well-suited to independent researchers, because they are about coming up with new frames/approaches/models/paradigms/etc.

But I would like to add a word of warning. Here are two somewhat equally valid ways to interpret LessWrong/Alignment Forum:

  1. It is a very big tent that welcomes every new idea

  2. It is a social media hang-out for AI alignment researchers who prefer to engage with particular alignment sub-problems and particular styles of doing alignment research only.

So while I agree with John's call for more independent researchers developing good new ideas, I need to warn you that your good new ideas may not automatically trigger a lot of interest or feedback on this forum. Don't tie your sense of self-worth too strongly to this forum.

On avoiding bullshit: discussion on this forum are often a lot better than on some other social media sites, but still Sturgeon's law applies.