More people getting into AI safety should do a PhD

AdamGleave

This is a linkpost for https://gleave.me/post/why-do-phd/

Doing a PhD is a strong option to get great at developing and evaluating research ideas. These skills are necessary to become an AI safety research lead, one of the key talent bottlenecks in AI safety, and are helpful in a variety of other roles. By contrast, my impression is that currently many individuals with the goal of being a research lead pursue options like independent research or engineering-focused positions instead of doing a PhD. This post details the reasons I believe these alternatives are usually much worse at training people to be research leads.

I think many early-career researchers in AI safety are undervaluing PhDs. Anecdotally, I think it’s noteworthy that people in the AI safety community were often surprised to find out I was doing a PhD, and positively shocked when I told them I was having a great experience. In addition, I expect many of the negatives attributed to PhDs are really negatives on any pathway involving open-ended, exploratory research that is key to growing to become a research lead.

I am not arguing that most people contributing to AI safety should do PhDs. In fact, a PhD is not the best preparation for the majority of roles. If you want to become a really strong empirical research contributor, then start working as a research engineer on a great team: you will learn how to execute and implement faster than in a PhD. There are also a variety of key roles in communications, project management, field building and operations where a PhD is of limited use. But I believe a PhD is excellent preparation for becoming a research lead with your own distinctive research direction that you can clearly communicate and ultimately supervise junior researchers to work on.

However, career paths are highly individual and involve myriad trade-offs. Doing a PhD may or may not be the right path for any individual person: I simply think it has a better track record than most alternatives, and so should be the default for most people. In the post I’ll also consider counter-arguments to a PhD, as well as reasons why particular people might be better fits for alternative options. I also discuss how to make the most of a PhD if you do decide to pursue this route.

Author Contributions: This post primarily reflects the opinion of Adam Gleave so is written using an “I” personal pronoun. Alejandro Ortega and Sean McGowan made substantial contributions writing the initial draft of the post based on informal conversations with Adam. This resulting draft was then lightly edited by Adam, including feedback & suggestions from Euan McLean and Siao Si Looi.

Why be a research lead?

AI safety progress can be substantially accelerated by people who can develop and evaluate new ideas, and mentor new people to develop this skill. Other skills are also in high demand, such as entrepreneurial ability, people management and ML engineering. But being one of the few researchers who can develop a compelling new agenda is one of the best roles to fill. This ability also pairs well with other skills: for example, someone with a distinct agenda who is also entrepreneurial would be well placed to start a new organisation.

Inspired by Rohin Shah’s terminology, I will call this kind of person a research lead: someone who generates (and filters) research ideas and determines how to respond to results.

Research leads are expected to propose and lead research projects. They need strong knowledge of AI alignment and ML. They also need to be at least competent at executing on research projects: for empirically focused projects, this means adequate programming and ML engineering ability, whereas a theory lead would need stronger mathematical ability. However, what really distinguishes research leads is they are very strong at developing research agendas: i.e., generating novel research ideas and then evaluating them so the best ideas can be prioritized.

This skill is difficult to get. It might take a long time to obtain and it doesn’t happen by default. Moreover, you can’t directly aim for developing this skill: just being an “ideas person” in a highly technical field rarely pans out. You need to get your hands dirty working on a variety of research projects and trying out different ideas to learn what does and doesn’t work. Being a really strong ML engineer or mathematician will help a lot since you can iterate faster and test out more ideas – but this only gets you more "training data”, you still have to learn from that. Apart from experience and iteration speed, the thing that seems to matter most for getting good at research agenda generation are the people you’re surrounded by (peers and mentors) and the environment (e.g. are you supported in trying out new, untested ideas?)

It may not be worth becoming a research lead under many worldviews. For one, there’s a large time cost: it typically takes around 5 years to gain the requisite skills and experience. So this option looks unattractive if you think transformative AI systems are likely to developed within the next 5 years. However, with a 10-years timeframe things look much stronger: you would still have around 5 years to contribute as a research lead. Another possibility is that creating more AI safety agendas may not be that useful. If the current AI safety approaches are more or less enough, the most valuable work may lie in implementing and scaling them up.

In the rest of the post, we’ll assume your goal is to become a research lead and learn to generate great research agendas. The main options available to you are a PhD, working as a research contributor, or independent research. What are the main considerations for and against each of these options?

Why do a PhD?

People

Having a mentor is a key part of getting good at generating research agendas. Empirically testing an idea could easily take you 6 months of work. But an experienced mentor should immediately have a sense of how promising the idea is, and so be able to steer you away from dead ends. This lets you massively increase the amount of training data you get: rather than getting meaningful feedback every 6 months you finish a project, you get it every week you propose an idea.

You don’t just get to learn from your advisor’s predictions of project outcome, but also the reasoning behind them. In fact, you probably want to learn to predict the judgement and reasoning of as many good researchers as you can – not just your official advisor, but other professors, post-docs, promising senior PhD students, and so on. Over time, you’ll learn to analyze research projects from a variety of different frames. At some point, you’ll probably end up finding many of these frames as well as your own judgement disagree with your advisor – congratulations, you’re now on your way to being a research lead. A (good) advisor is more of a mentor than a boss, so you will have the freedom to try different things.

For this reason, it matters enormously where you do your PhD: if you are surrounded by mediocre researchers, your learning opportunities will be significantly diminished. However, universities still have some of the best AI talent in the world: professors in top departments are leaders in the field and have 10+ years of research experience. They are comparable to senior team leads at the top industry labs. If you can get directly advised by a professor of this calibre, that’s a great deal for you.

Environment

Within a PhD program you’re incentivized to come up with your own research ideas and execute on them. Moreover, the program guarantees at least some mentorship from your supervisor. Your advisor’s incentives are reasonably aligned with yours: they get judged by your success in general, so want to see you publish well-recognized first-author research, land a top research job after graduation and generally make a name for yourself (and by extension, them).

Doing a PhD also pushes you to learn how to communicate with the broader ML research community. The “publish or perish'' imperative means you’ll get good at writing conference papers and defending your work. This is important if you want your research to get noticed outside of a narrow group of people such as your colleagues or LessWrong readers. It’ll also help you influence other ML researchers’ work and build a consensus that safety is important.

You'll also have an unusual degree of autonomy: You’re basically guaranteed funding and a moderately supportive environment for 3-5 years, and if you have a hands-off advisor you can work on pretty much any research topic. This is enough time to try two or more ambitious and risky agendas.

But freedom can be a double-edged sword. Some people struggle with the lack of structure, and a lot of people fritter the opportunity away doing safe, incremental work. But if you grasp it, this is an excellent opportunity.

Alternatives to PhDs

Doing independent research

As an independent researcher, you get time to think and work on ideas. And you’ll feel none of the bad incentives that industry or academia place on you.

But by default you’ll be working alone and without a mentor. Both of these things are bad.

Working by yourself is bad for motivation. Entrepreneurs are unusually self-motivated and have “grit”, but are still strongly recommended to find a co-founder. If you think being isolated doesn’t matter, you’re probably fooling yourself.

Moreover, without a mentor your feedback loop will be a lot longer: rather than getting regular feedback on your ideas and early-stage results, you’ll need to develop your research ideas to the point where you can tell if they’re panning out or not. In some fields like mechanistic interpretability that have fast empirical feedback loops this may be only a modest cost. In research fields with longer implementation times or a lack of empirical feedback, this will be much more costly.

And mentor-time is hard to come by. There aren’t many people who are able to i) impart the skills of research idea generation and evaluation and ii) donate enough time to actually help you learn good taste. That’s not to say it isn’t possible to find someone happy to mentor you, but getting comments on your Google Docs every 3 months is unlikely to be good enough. I think an hour every other week is the minimum mentorship most people need, although some people are exceptionally quick independent learners.

Working as a research contributor

As a research contributor you execute on other people’s ideas, for example as a research engineer in an industry lab. This is often an excellent way of getting good at execution as well as learning some basic research skills. But it is not usually sufficient for getting good at developing research agendas.

Industry lab agendas are often set top-down, so your manager likely won’t give you opportunities to practice exploring your own research ideas. It’s also worth noting that most research leads at these organizations seem to have PhDs anyway. But that’s not to say there aren’t firms or teams where working as a research engineer would be better than doing a PhD.

Similarly, non-profit alignment organizations (like Redwood, Apollo, METR, ARC) often have pre-set research agendas. Furthermore, these organizations are often staffed by more junior researchers, who may not be able to provide good mentorship.

Working as an RA at an academic lab also usually involves just executing on other people’s ideas. However, it is a bit better optimized for PhD applications: Professors are well-placed to write a strong recommendation letter, and RA projects are usually designed to be publishable.

Working as a research contributor can be a good starting point for the first year or two of a prospective research lead’s career. In particular, engineering skills are often acquired faster and better in a company than a PhD. So even if a PhD is your end goal, it may be worth spending some time in a research contributor role. Indeed, many well-run academic labs more or less have an apprentice system where junior PhD students will initially work closely with more senior PhD students or post-docs before they can operate more independently. Starting a PhD a bit later but with greater independence could let you skip this step.

However, if you do opt to start working as a research contributor, choose your role carefully. You’ll want to ensure you develop a strong PhD portfolio (think: can you publish in this role, and get a strong recommendation letter?). Additionally, be honest with yourself as to whether you’ll be willing to take a paycut in the future. Going from undergraduate to a PhD will feel like getting richer, whereas going from an industry role to a PhD will involve taking a massive pay-cut. Although you might have a higher standard of living with supplemental savings from an industry role, it won’t feel like you do. Setting yourself a relatively strict budget to prevent your expenses expanding to fill your (temporarily elevated) salary can help here.

Things to be wary of when doing a PhD

Although I are in favour of more people doing PhDs, I do think they fall far short of an ideal research training program. In particular, the quality of mentorship varies significantly between advisors. Many PhD students experience mental health issues during their programme, often with limited support.

I think most criticisms of PhDs are correct, but as it currently stands the other options are usually worse than PhDs. We’d be excited to see people develop alternative, better ways of becoming research leads, but until that happens I think people should not be discouraged from doing PhDs.

Your work might have nothing to do with safety

By default, a PhD will do an excellent job at training you to predict the outcome of a research project and getting research ideas to work. But it will do very little to help you judge whether the outcome of a research project actually matters for safety. In other words, PhD’s do not train you to evaluate the theory of impact for a research project.

Academic incentives are mostly unrelated to real-world impact. The exception is if you’re in a program where other students or academics care about alignment, where you’ll probably get practice at evaluating theories of impact. See below if you want some specific recommendations on how to make this happen.

But for most people, this won’t be the case and you’ll have to supplement with other sources. The easiest way is to attend AI safety focused conferences and workshops, co-work from an AI safety hub (mostly located in the SF Bay Area & London) and/or intern at an AI safety non-profit or an industry org’s safety team.

Your mental health might suffer

The mental health of graduate students is notoriously bad. Some PhD programs are better than others at giving students more guidance early on, or training supervisors to be better at management. But even in the best case, learning how to do research is hard. If you think you are high-risk for mental health issues, then you should choose your PhD program and advisor carefully, and may want to seriously consider alternatives to a PhD.

Anecdotally, it seems like mental health amongst independent researchers or in some alignment non-profits might be as bad as in PhD programs. However, mental health is often better in more structured roles, and at organizations that champion a healthy management culture.

So what should you do?

There are multiple options available to get good at developing research agendas and I am definitely not suggesting that doing a PhD is the correct choice for everyone. Weighing up what’s best for you to do will depend on your background and history.

But it’ll also depend on what specific options you have available to you. We’d stress that it’s worth exploring multiple paths (e.g. PhD and research engineering) in parallel. Even if one path is on average more impactful or a better fit for you, the best option in a given track usually dwarfs the median option in other tracks. Doing a PhD might be better for most people, but working as an ML engineer at a top AI safety non-profit probably beats doing a PhD at a low-ranked program with no one working on safety.

To try and work out how good a PhD is likely to be, ask:

How good a researcher is your supervisor?
How good a mentor are they? (Visit their lab and ask current grad students!)
How interested are they in AI Safety?
How much flexibility do you have to choose your own projects?

If you’re doing independent research, then ask:

Do you already have most of the skills needed for this research project?
Have you thrived in independent environments with limited accountability in the past?
Do you already have a research track record?
What are your sources of mentorship and feedback? How much of their time are they able to give?

Advice for making the most of a PhD

Improving execution: I would suggest starting by prioritizing getting high-bandwidth, object-level feedback from mentors to improve your execution and general knowledge of the field. You could get this by working with a junior professor who has a lot of time, or a post-doc or senior PhD student. You'll learn a lot about how to execute on a project, including implementation, experiments, and write-up. At this point it’s fine to work on other people's ideas, and on non-safety projects.

Improving idea generation: In the background, read up on safety and try to keep an eye on what's going on. Form opinions on what's good and bad, and what’s missing. Keep a list of ideas and don't worry too much if they're good or bad. Flesh out the ones you think are best into one to two page proposals. Ask safety researchers for feedback on your theory of change, and ask non-safety AI researchers for feedback on general tractability and technical interest.

Improving idea evaluation: If other students or academics in your program are interested in alignment, you could set up a reading group. One format which seems to work well is for one person to go deep on the research agenda of another safety researcher, and to start the meeting by explaining and justifying this agenda. Then the rest of the meeting is the group engaging in spirited debate and discussion about the agenda. This feels less personal than if the agenda of someone in the room is being critiqued.

I also sometimes recommend a reading group format where people present their own ideas and get feedback. I think it's good if these are low-stakes – for example, where the norm is that it’s acceptable to present half-baked ideas. It's easy to get demotivated if you put a lot of work into an idea and it gets shot down. Another good format is cycles of "clarify, correct, critique", where you start by understanding what someone else is proposing, try to improve/correct any issues with it, then critique this stronger version of it.

Increase your independence: After the first year or two (depending on how much prior experience you have and how long the PhD program is), switch to working more on your own ideas and working autonomously. Now it's time to put the pieces together. Your time spent ideating and evaluating will have given you a list of ideas that are safety-relevant and which you & your advisor agree are strong. Your time spent developing execution skills will have enabled you to rapidly test these ideas.

Increase your ambition: Gradually start being more ambitious. Rather than aiming for individual project ideas, can you start to craft an overarching agenda? What is your worldview, and how does it differ from others? This won't happen overnight, so thinking about this little but often might be the best approach.

Conclusion

Doing a PhD is usually the best way to get great at the key skills of generating and evaluating research ideas. At a top PhD program you’ll be mentored by world-class researchers and get practice developing and executing on your own ideas. PhD programs are by no means ideal, but I think they are usually the best option for those aiming to be one of the few researchers who can develop a compelling, new research agenda.

In particular, I think most people are unlikely to become research leads by working as a research contributor or by doing independent research. However, other roles can make equal or greater contributions to AI safety research, and there are a number of reasons why doing a PhD might not be the best option for any individual person.

[-]Oliver Habryka1mo1811

Hmm, it feels to me this misses the most important objection to PhDs, which is that many PhDs seem to teach their students actively bad methodologies and inference methods, sometimes incentivize students to commit scientific fraud, teach writing habits that are optimized to obscure and sound smart instead of aiming to explain clearly and straightforwardly, and often seem to produce zero-sum ideas around ownership of work and intellectual ideas that seem pretty bad for a research field.

To be clear, there are many PhD opportunities that do not have these problems, but many of them do, and it seems to me quite important to somehow identify PhD opportunities that do not have this problem. If you only have the choice to do a PhD under an advisor who does not to you seem actually good at producing clear, honest and high-quality research while acting in high-integrity ways around their colleagues, then I think almost any other job will be better preparation for a research career.

[-]AdamGleave1mo139

I'm sympathetic to a lot of this critique. I agree that prospective students should strive to find an advisor that is "good at producing clear, honest and high-quality research while acting in high-integrity ways around their colleagues". There are enough of these you should be able to find one, and it doesn't seem worth compromising.

Concretely, I'd definitely recommend digging into into an advisor's research and asking their students hard questions prior to taking any particular PhD offer. Their absolutely are labs that prioritize publishing above all else, turn a blind eye to academic fraud or at least brush accidental non-replicability under the rug, or just have a toxic culture. You want to avoid those at all costs.

But I disagree with the punchline that if this bar isn't satisfied then "almost any other job will be better preparation for a research career". In particular, I think there's a ton of concrete skills a PhD teaches that don't need a stellar advisor. For example, there's some remarkably simple things like having an experimental baseline, running multiple seeds and reporting confidence intervals that a PhD will absolutely drill into you. These things are remarkably often missing from research produced by those I see in the AI safety ecosystem who have not done a PhD or been closely mentored by an experienced researcher.

Additionally, I've seen plenty of people do PhDs under an advisor who lacks one or more of these properties and most of them turned out to be fine researchers. Hard to say what the counterfactual is, the admission process to the PhD might be doing a lot of work here, but I think it's important to recognize the advisor is only one of many sources of mentorship and support you get in a PhD: you also have taught classes, your lab mates, your extended cohort, senior post-docs, peer review, etc. To be clear, none of these mentorship sources are perfect, but part of your job as a student is to decide who to listen to & when. If someone can't do that then they'll probably not get very far as a researcher no matter what environment they're in.

[-]OliverHayman1mo624

How often do people not do PhDs on the basis that they don't teach you to be a good researcher? Perhaps this is different in certain circles, but almost everyone I know doesn't want to do a PhD for personal reasons (and also timelines).

The most common objections are the following:

PhDs are very depressing and not very well paid.
Advisors do not have strong incentives to put much effort into training you and apparently often won't. This is pretty demotivating.
A thing you seem to be advocating for is PhDs primarily at top programs. These are very competitive, it is hard to make progress towards getting into a better program once you graduate, and there is a large opportunity cost to devoting my entire undergraduate degree to doing enough research to be admitted.
PhDs take up many years of your life. Life is short.
It is very common for PhD students (not just in alignment) to tell other people not to do a PhD. This is very concerning.

If I was an impact-maximizer I might do a PhD, but as a person who is fairly committed to not being depressed, it seems obvious that I should probably not do a PhD and look for alternative routes to becoming a research lead instead.

I'd be interested to hear whether you disagree with these points (you seem to like your PhD!), or whether this post was just meant to address the claim that it doesn't train you to be a good researcher.

[-]AdamGleave1mo93

Whether a PhD is something someone will enjoy is so dependent on individual personality, advisor fit, etc that I don't feel I can offer good generalized advice. Generally I'd suggest people trying to gauge fit try doing some research in an academic environment (e.g. undergrad/MS thesis, or a brief RA stint after graduating) and talk to PhD students in their target schools. If after that you think you wouldn't enjoy a PhD then you're probably right!

Personally I enjoyed my PhD. I had smart & interesting colleagues, an advisor who wanted me to do high-quality research (not just publish), I had almost-complete control over how I spent my time, could explore areas I found interesting & important in depth. The compensation is low but with excellent job security and I had some savings so I lived comfortably. Unless I take a sabbatical I will probably never again have the time to go as deep into a research area so in a lot of ways I really cherish my PhD time.

I think a lot of the negatives of PhDs really feel like negatives of becoming a research lead in general. Trying to create something new with limited feedback loops is tough, and can be psychologically draining if you tie your self-worth with your work output (don't do this! but easier said than done for the kind of person attracted to these careers). Research taste will take up many years of your life to develop -- as will most complex skills. etc.

[-]Tamsin Leake1mo31

So this option looks unattractive if you think transformative AI systems are likely to developed within the next 5 years. However, with a 10-years timeframe things look much stronger: you would still have around 5 years to contribute as a research.

This phrasing is tricky! If you think TAI is coming in approximately 10 years then sure, you can study for 5 years and then do research for 5 years.

But if you think TAI is coming within 10 years (for example, if you think that the current half-life on worlds surviving is 10 years; if you think 10 years is the amount of time in which half of worlds are doomed) then depending on your distribution-over-time you should absolutely not wait 5 years before doing research, because TAI could happen in 9 years but it could also happen in 1 year. If you think TAI is coming within 10 years, then (depending on your distribution) you should still in fact do research asap.

(People often get this wrong! They think that "TAI probably within X years" necessarily means "TAI in approximately X years".)

[-]Richard Ngo1mo55

But if you think TAI is coming within 10 years (for example, if you think that the current half-life on worlds surviving is 10 years; if you think 10 years is the amount of time in which half of worlds are doomed)

Note that these are very different claims, both because the half-life for a given value is below its mean, and because TAI doesn't imply doom. Even if you do have very high P(doom), it seems odd to just assume everyone else does too.

then depending on your distribution-over-time you should absolutely not wait 5 years before doing research, because TAI could happen in 9 years but it could also happen in 1 year

So? Your research doesn't have to be useful in every possible world. If a PhD increases the quality of your research by, say, 3x (which is plausible, since research is heavy-tailed) then it may well be better to do that research for half the time.

(In general I don't think x-risk-motivated people should do PhDs that don't directly contribute to alignment, to be clear; I just think this isn't a good argument for that conclusion.)

[-]Stephen McAleese13d10

I think this section of the post is slightly overstating the opportunity cost of doing a PhD. PhD students typically spend most of their time on research so ideally, they should be doing AI safety research during the PhD (e.g. like Stephen Casper). If the PhD is in an unrelated field or for the sake of upskilling then there is a more significant opportunity cost relative to working directly for an AI safety organization.

AI ALIGNMENT FORUM
AF