Experiences and learnings from both sides of the AI safety job market

Marius Hobbhahn

I’m writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of Apollo Research or any other program I’m involved with.

In 2022, I applied to multiple full-time AI safety positions. Now, I switched sides and ran multiple hiring processes for Apollo Research. Because of this, I feel like I understand the AI safety job market much better and may be able to help people who are looking for AI safety jobs to get a better perspective.

This post obviously draws a lot from my personal experiences many of which may not apply to your particular situation, so take my word with a grain of salt.

Executive summary

In the late Summer of 2022, I applied to various organizations working on AI safety. I got to the final stages of multiple interview processes but never received an offer. I think in all cases, the organization chose correctly. The person who received the offer in my stead always seemed like a clearly better fit than me. At Apollo Research, we receive a lot of high-quality applications despite being a new organization. The demand for full-time employment in AI safety is really high. This should probably change applicants’ strategy and expectations but should not stop them from applying!
Focus on getting good & provide legible evidence: Your social network helps a bit but doesn’t substitute bad skills and grinding Leetcode (or other hacks for the interview process) probably doesn’t make a big difference. In my experience, the interview processes of most AI safety organizations are meritocratic and high signal. If you want to get hired for an evals/interpretability job, do work on evals/interpretability and put it on your GitHub, do a SERI MATS stream with an evals/interpretability mentor, etc. This is probably my main advice, don’t overcomplicate it, just get better at the work you want to get hired for and provide evidence for that.
Misc:
1. Make a plan: I found it helpful to determine a “default path” that I’d choose if all applications failed, rank the different opportunities, and get feedback on my plan from trusted friends.
2. The application process provides a lot of information: Most public writings of orgs are 3-6 months behind their current work. In the interviews, you typically learn about their latest work and plans which is helpful even if you don’t get an offer.
3. You have to care about the work you do: I often hear people talking about the instrumental value of doing some work, e.g. whether they should join an org for CV value. In moderation this is fine, when overdone, this will come back to haunt you. If you don’t care about the object-level work you do, you’ll be worse at it and it will lead to a range of problems.
4. Honesty is a good policy: Being honest throughout the interview process is better for the system and probably also better for you. Interviewers typically spot when you lie about your abilities and even if they didn’t you’d be found out the moment you start. The same is true to a lesser extent for “soft lies” like overstating your abilities or omitting important clarifications.

It can be hard & rejection feels bad

There is a narrative that there aren’t enough AI safety researchers and many more people should work on AI safety. Thus, my (arguably naive) intuition when applying to different positions in 2022 was something like “I’m doing a Ph.D. in ML; I have read about AI safety extensively; there is a need for AI safety researchers; So it will be easy for me to find a position”. In practice, this turned out to be wrong.

After running multiple hiring rounds within Apollo Research and talking to others who are hiring in AI safety, I understand why. There are way more good applicants than positions and even very talented applicants might struggle to find a full-time position right away. I think this is bad and hope that there will be more AI safety orgs to use that talent (as I’ve detailed here). Currently, AI safety organizations have the luxury of hiring candidates who can contribute almost immediately. In other industries, employers expect that they have to invest the first couple of months to upskill new hires. In AI safety, the supply of talent is high and programs like SERI MATS are doing a great job, so many of the best candidates can contribute from day one.

While this may sound demotivating to some, I think it’s important to know. For example, I’d recommend applying to multiple positions and programs and spreading your bets a bit wider since any individual position might be hard to get. On the other hand, I think fairly fast upskilling is possible (see next section), so you can improve your chances a lot within a couple of months.

Another realization for me was that getting a highly desirable job is just hard in general. Michael Aird has written and talked about this already but I think it’s still important to keep in mind. Most tech companies that work on AI safety probably get more than 100 applications per position. Even if you’re among the top five candidates for a position, rejection is still more likely than an offer. Furthermore, there is some stochasticity in the hiring process (but much less than I expected). The company has to make a decision with only a few datapoints. They know your CV and they have somewhere between 3 and 10 hours of interviews to judge you by which is not a lot of time. Also, you might have had a bad day for an interview or were unable to answer a specific question. So the fact that you got rejected may not mean a lot in the grand scheme of things.

A good friend of mine provided a framing for the hiring process that I found helpful. He thinks of it as a repeated series of coinflips where every individual flip has a low chance of coming up heads but if one does, you win big and can stop flipping for a while. If a job is desirable it is competitive and if it is competitive, rejection is more likely than an offer, even if you are among the best couple of candidates. However, this doesn’t mean you shouldn’t flip the coin anymore. You should still apply to jobs you want and think you’d be a good fit for.

Nonetheless, rejection sucks. I know that rejection is the norm and yet I was disappointed every time. I can even rationally understand the decision--other candidates were just better than me--but emotionally it still feels bad. If you care about the work you do, you usually want to work in a team of people who care as well. Furthermore, I think it is rational for you to envision how you would work in a specific company to be able to present a clear vision for your research during the interviews. I noticed that during this process I could really see myself in that position and built up a vision of how work there would look like. Then, a rejection just pops that bubble and you have to repeat the same process for another company.

Also, while rejections feel bad at the time, in the large scheme of things they are really pretty minor. So if the fear of rejection stops you from applying in the first place, I’d really recommend finding some way to overcome that fear, e.g. by having a friend send the application with you. The benefits of applying are typically much higher than the downsides if you get rejected (see later section).

Main takeaways: Most people don’t get their dream job on the first try. Rejection usually feels bad but it’s still worth applying to jobs that you think you are a good fit for. Spreading your bets is probably a good policy.

Focus on getting good

There is some amount of stochasticity in the hiring process and there are some benefits to being well-connected. However, my main takeaway from both sides of the hiring process is that the process mostly works as intended--the people with the best fit for the position get hired and the system is fairly meritocratic overall.

Ironically enough, my rejections made me more convinced that the process is meritocratic. Whenever I felt like I didn’t perform well in an interview, I got rejected shortly after, indicating that the interviewers potentially had the same feeling. Furthermore, the AI safety community isn’t that large, so I often knew the people who got the offer instead. Before I knew who got the offer, my brain would go “The process isn’t accurate, they probably made a mistake or the other person was just well connected” and as soon as I knew who got hired it was immediately very clear that they were just a better fit for the position and that the organization had just made the right call in rejecting me.

Furthermore, I now think well-run interviews provide way more evidence about a candidate's fit than I had expected. I’ve run maybe 70-100 interviews this year so far and I was surprised by how well you can judge the fit of a candidate in such a short amount of time. Firstly, it’s much harder to fake expertise than I thought. Thus, a well-posed interview question will very quickly find the limits of your actual expertise and differentiate between candidates. For example, I think it’s very hard to simulate being a good ML engineer for 60 minutes without actually being one. There might be some stochasticity from daily performance but the variance just seems fairly small in contrast to the difference in underlying skill. Secondly, people are (thankfully) mostly honest in interviews and are straight about their limitations and skills. So just asking specific questions directly already gives you a good sense of their fit. Also, most people are not very good at lying and if you overplay your achievements, interviewers will likely catch onto that pretty quickly (see section “Honesty is a good policy”).

Thus, my main recommendation is to “focus on getting good”. This might sound incredibly obvious but I think people sometimes don’t act according to that belief. There are a lot of other things you could focus on in the belief that they are the best way to increase your chances of a job offer. For example, you might overfit the interview process by grinding Leetcode 24/7 or you might invest a lot of time and effort into building a personal network that then gets you a job you’re not actually qualified to do or you might jump on lots of projects without actually contributing to them to increase your visibility.

However, I think most of these shortcuts will come back to haunt you and every sustainable benefit is mostly a consequence of being good at the core skill you’re hired for. You might be able to trick some people here and there but once you work with them they will realize you’re not up for the job if you don’t have the required skills. Also, I think people overestimate their ability to Goodhart the interview process. Interviewers typically have a lot more experience at interviews than candidates. If you’re trying to oversell, a skilled interviewer will catch you right in the act.

Thus, I suggest people should rather focus on doing a project that requires similar skills as the job they’re looking for than grinding Leetcode, building a network, or jumping around between projects and then providing legible evidence of their skills (see next section).

Typically, it’s fairly obvious to know what “getting good” means, e.g. because organizations state it in detail in their job descriptions or on their websites. Most of the time, organizations who are hiring are also fairly straightforward when you just ask them about what they are looking for.

Main takeaways: focus on getting good at the core skills that the job requires, and ignore most other stuff. Don’t overfit to Leetcode questions and don’t freak out because you don’t have a big network. Assess your skills honestly and focus on the core things you need to get better at (which are typically fairly obvious). If your key skills are there and visible, everything else will come on its own.

Provide legible evidence of your abilities

Words are cheap and it is easy to say what you plan on doing or what kind of vision you have. Doing good research in practice is always harder than imagined and costs time. Therefore, providing evidence that you’re not only able to think about a specific topic but also able to make practical progress on it is an important signal to the potential employer.

One of the things employers are looking for the most is “has that candidate done good work very close to the work they’d be hired for in the past?” and I think this is for good reasons. Imagine having two hypothetical candidates for an interpretability position: both have a decent ML background and are aligned with your organization’s mission. Candidate A has interesting thoughts on projects they would like to run if hired, candidate B also has good thoughts and on top of that a 3-month interpretability project with public code under their belt. You can judge candidate B so much better than candidate A. The fact that candidate B did a project means you can judge their actual skills much more accurately. The fact that they then applied likely means they enjoyed the project and are motivated to continue working on interpretability. The fact that they finished it, published the code, and wrote up a report tells you a lot about non-technical skills such as their productivity and endurance. Lastly, they already invested 3 months into interpretability, so they are much closer to being able to contribute meaningfully to the projects in your organization right away. For candidate A, you have much more uncertainty about all of these questions, so the comparatively small difference of this project (it’s only 3 months of difference vs. ~5 years of education that they share) makes a huge difference for the employer.

Therefore, providing concrete evidence of your abilities seems like one of the best ways to increase your chance of an offer. Between employer and job, the specific evidence obviously differs but it’s usually not hard to guess what evidence companies are looking for, e.g. just look at their job descriptions. Furthermore, most companies usually just tell you exactly what skills they are looking for if you ask them, e.g. in the job description or on their website.

Doing a project on your own is possible but much harder than with mentorship or collaborators. Therefore, I strongly recommend applying to SERI MATS, ARENA, the AI safety camp, or similar programs to work on such projects. I personally benefitted a lot from SERI MATS despite having previous independent research experience and can wholeheartedly recommend the program.

In the past, multiple people or organizations have reached out to me and asked me to apply for a position or assist them with a project, and almost always one of my public blogposts was explicitly mentioned as the reason for reaching out. So the fact that I had publicly legible evidence was, in fact, the core reason for people to think of me as a potential candidate in the first place.

Lastly, putting all of the other reasons aside--working on a project for a couple of months actually provides you with a lot of new evidence about yourself. For example, I was convinced that I should go in a specific direction of AI safety multiple times and when I started a project in that direction, I quickly realized that I either didn’t enjoy it or didn’t feel good enough to meaningfully contribute. Thus, working on such projects not only increases your employability, it also prevents you from committing to paths you aren’t a good fit for.

In general, I can strongly recommend just going for it and diving into projects in fields that sound interesting even if you’re new to them. You don’t need to read all the other stuff people have done or ask anyone for permission. You can just start hacking away and see if you enjoy it. I think it’s by far the most efficient way to both get better at something and test your fit for it. When you enjoy the work, you can still read all related papers later.

Main takeaways: Providing legible evidence for your skills makes a big difference in your employability. Doing concrete projects that would produce such evidence is also a great way to test whether you’re good at it yourself and whether you enjoy it.

Make a plan

When I started my job search process, I created a Google doc that contained the following items:

An overview of my CV where I try to evaluate how a reviewer would interpret different achievements.
A list of strengths and weaknesses from introspection and from feedback in the past.
A list of organizations that I could imagine working for with pros and cons for each organization.
A “default path”, i.e. a path that I would want to pursue if I got rejected by all orgs I applied to (in my case this was independent research and upskilling).
A priority list of organizations with predictions of my probability of receiving an offer (all predictions ranged from 5 to 20 percent); I then measured all organizations against the “default path” and decided not to apply to any organization that I preferred less than the default path. This left me with 5 or 6 organizations and a probability of getting one or more offers of ~50%.

I then sent this document to a handful of trusted friends who gave me really valuable feedback, changed a couple of things as a result, and then applied to the organizations that were preferable to the default path.

I found this process extremely valuable because

It forced me to evaluate my strengths and weaknesses. During the feedback gathering round, I realized that many people mentioned similar strengths that I had not considered before (I think the feedback is true, it’s just one of these things that “everyone else is mysteriously bad at” and I thus didn’t think of it as a strength of mine). This influenced how I wrote applications and what I emphasized in the interview.
It forced me to make the case for and against every organization. During this process, I realized that some organizations do not really fit my agenda or skillset that well, and I previously wanted to apply for status. Making the explicit case made me realize that I shouldn’t apply to these orgs.
It forced me to come up with a “default path” which I found very anxiety-reducing. Once I had a default path that I was comfortable with, I felt like nothing could go very wrong. In the worst case, I’ll follow the default plan which I think was still pretty good. I just couldn’t fall very low even if rejection would feel bad.
It forced me to put honest probability estimates on my chances. This made me realize that my estimated aggregate chance of getting an offer is only at 50% which made me plan my default path in much more detail.
The feedback from my friends was very valuable. It was helpful to get my reasoning checked and my self-evaluation criticized constructively.

I don’t think it’s absolutely necessary to make such a plan but I think it structured my thinking and application process a lot and it probably saved me time in the long run. It took me maybe a maximum of 10-20 hours in total to write the doc but saved time for every application by reducing my total number of applications.

Main takeaways: Making a plan can structure your application process a lot. I would especially recommend it to people who are looking for their first job.

The application process provides a lot of information

For a while, I had the intuition that “I need to prepare a lot more before applying”. I think this intuition is mostly false. There are cases where you just clearly aren’t a good fit but I think the bar for applying is much lower than many (especially those with imposter syndrome) assume. My heuristic for applying was “Would I take the offer if I got one and do I expect to make it through the screening interview” (this might even be too high of a bar; when in doubt, just apply).

There are many ways in which the application process gives you valuable information and feedback:

The job description is often relatively detailed and companies say very clearly what they want. Just looking through the descriptions often gave me a good sense of whether the position aligns with the research I find most promising and whether I should apply in the first place. It also gives a pretty clear path to which kind of research you might want to work on if you want to increase your chances in the future. Most organizations are pretty transparent with what they are looking for.
If you get rejected without being invited to an interview, this is unfortunate but still valuable feedback. It basically means "You might not be there yet" (though as Neel points out in the comments, CV screening can be a noisy process)~~“You clearly aren’t there yet”~~. So you should probably build more skills for 6 months or so before applying again.
If you get into the interviews you usually have a screening interview in the beginning, i.e. what you want to work on, what the company wants to do, etc. While some of this information is public, the company's public record usually lags behind the actual state of research by 3 months or more. So talking to someone about what the org is currently working on or intends to work on in the future, can give you a lot of valuable information that you wouldn’t get from their website. This made it much easier for me to decide whether my own research goals were aligned with those of the org.
The technical interviews give you some sense of what kind of level the company is looking for. If they feel easy, your base skills are probably good enough. If they feel hard, you might want to brush up. I found that technical interviews really differ between companies where some do very general coding and math questions and others very concrete problems that they have already encountered within their work. I found the latter to be much more valuable because I got a better feeling for what kind of problems they see on a day-to-day basis.
I applied to research scientist positions and thus usually had research interviews. In these, you talk about the research you did in the past and the audience asks you questions about that. I found it valuable to not only talk about my past research projects but also lay out what I intend to work on in the future. In my case, my future plans have nothing to do with my Ph.D., so it felt important to emphasize the difference. In general, I found it helpful to prepare the research interviews with the question “What do I want other scientists at that organization to know about me?” in mind.
In some cases, you get a final interview, e.g. with the manager you would be working under or some higher-up in the company. These interviews are usually not technical and can be very different from person to person. In some cases, it was just a friendly chat about my research interests, in other cases, I was asked detailed questions about my understanding of the alignment problem. During the latter interview, I realized that I was unable to answer some of the more detailed questions about the alignment problem. On the one hand, I knew right after the interview, that I’d be rejected but on the other hand, it forced me to think about the problem more deeply and led to a change in my research agenda. Thus, I’d say that this interview was extremely helpful for my development even if it led to a rejection.
The final decision of whether they make you an offer or not is valuable feedback but I wouldn’t update too much on rejection. If you get the offer, that’s obviously nice feedback. If you get into the final round, that means you’re nearly there but still need to improve and refine a bit but it could also be a result of the stochasticity of the interview process.

Importantly, interviews go both ways. It’s not just the organization interviewing you, it’s also you interviewing them. Typically, after every interview, you can ask questions about them, e.g. what they are currently working on, what their plans are, what the office culture looks like, etc. The quality of the interviews is also feedback for you, e.g. if they are well-designed and the interviewer is attentive and friendly, that’s a good sign. Whenever an interview was badly designed or the interviewers just clearly didn’t give a shit, I updated against that organization (sidenote: if you went through Apollo Research’s interview process and felt like we could improve, please let me know).

Main takeaways: The hurdle for applying is probably lower than many people think. I find “Would I take the job if I got an offer and do I expect to get through the CV screening?” to be a good heuristic. Interviews provide a lot of information about the organization you’re applying to. Interviews go both ways--if the interview feels bad, this is evidence against that organization.

You have to care about your work

I think that people interested in AI safety are more likely than the median employee to do their job for the right reasons, i.e. because they think their work matters a lot and it is among the best ways for them to contribute. However, many other factors influence such an important decision--status, money, hype, flavor of the month, etc. My experience so far is that these other influences can carry your motivation for a couple of months but once things get tough it usually gets frustrating and it’s much harder to show a similar level of productivity as with a project you actually deeply care about.

Caring about a project can come in many flavors and is not restricted to intrinsically caring about that particular project. The project could also just be a small step to reaching a larger goal you care about or to learn something about another project you care about.

For me, a good heuristic to investigate my motivations is “Would I do this work if nobody cared?”. For a long time, I was not sure what approaches I find most promising and am a good fit for. After a period of explicitly thinking about it, I converged on a cause and approach that I felt very good about. At that point, I realized that I deeply cared about that particular avenue (detecting deceptive alignment with empirical approaches), and my future plans were roughly “apply to a company and work on X if accepted” or “if I don’t get an offer, work on X anyway (on a grant or self-funded)”. I found this insight really helpful and freeing and my speed of improvement increased as a result. It also led to me starting Apollo Research because nobody worked on exactly the thing I found most important.

More concretely, I think there is a common failure mode of people deeply caring about AI safety in general and therefore thinking they should work on anything as long as it relates to AI safety somehow. My experience is that this general belief does not translate very well to your daily mood and productivity if you don’t also enjoy the object-level work. Thus, if you realize having thoughts like “I’d really like to work for <AI safety org> but I don’t think their object-level agenda is good”, it may be a good time to rethink your plans despite salary, status, experience, and other pull factors.

Obviously, there are caveats to this. Early on, you don't know what you care about and it's good to just explore. Also, you shouldn't overoptimize and always only do exactly the thing you care most about since there are real-world trade-offs. My main point is, that if you realize you don't really care that much about the work you're doing consistently, it's a good time to ask if it is worth continuing.

Main takeaways: There are many reasons why we choose to work on different projects or in different positions. In my personal experience, the motivation from most reasons fades unless you actually care about the actual work you do on a day-to-day basis.

Honesty is a good policy

A general takeaway from job applications, hiring, and working in my current role is that honesty is generally a good policy (which doesn’t mean you should always say every single thought you have).

From a systemic perspective, honesty makes everything much more efficient. When candidates accurately report their skills, it’s much easier to make decisions than when they try to game the system since fewer guardrails against lying need to be put in place. Thus, it would require less time for interviewers and applicants to go through the hiring process. However, such a system could be exploited by a skilled adversary who lies about their skills. Thus, organizations have to protect against these adversaries and make the process harder and more robust. This increases costs and bloats up the process.

Typically, this means that interviewers have to double-check information and dive much deeper into topics than would be necessary if everyone were honest. Since hiring is a dynamic process, some applicants will try to come up with new ways to game the process and the organization has to spend a disproportionally large amount of time finding and dealing with these adversarial applicants.

However, my best guess is that being such a dishonest adversary is rarely beneficial for the applicants themselves. Interviewers often have had hundreds of interviews and thus have much more experience spotting dishonest applicants while any given applicant probably had much less. Furthermore, most employers (AI safety orgs probably even more than others) see dishonesty as a big red flag if found out.

Furthermore, just from a personal view, you probably also prefer to work with honest people, so a signal that you’re committed to honesty may make other people around you more honest too.

Finally, I think it’s good to explicitly pre-commit to honesty before the hiring process. It’s plausible that there will be situations where you could get away with some cheating here and there or some “small lies” but it’s bad for the overall system and likely not worth the risk of getting caught. For example, you may ask a friend who has already gone through the process to give you hints on what to prepare or try to overplay your skills when you expect the interviewer not to check the claims in detail. When you really want to have the job or you panic during the process, you may be tempted to cheat “just a bit”. To prevent yourself from rationalizing cheating in the moment, I’d recommend pre-committing to honesty.

Main takeaways: Just be honest. Neither overstate nor understate your abilities and answer questions accurately. It’s good for the system as well as yourself.

Final words

My impression of the AI safety job market so far is that, while the system is sometimes a bit unorganized or stochastic, it is mostly meritocratic and the people who get offers tend to be good fits for the job.

Most desirable jobs have a lot of competition and even very good people get rejections. However, this does not mean that you should give up. It is possible to improve your skills and since the field is so young, it doesn’t take a long time to contribute.

There are many things you can do to increase your chance of getting an offer such as grinding Leetcode or building a network and while these are certainly helpful I would not recommend prioritizing them. The primary reason why someone gets a job is because they are good at it. So my number one recommendation would be “focus on getting good and everything else will be much easier”.

[-]Neel Nanda5mo1313

Thanks for writing this, this is a great post and I broadly agree with most of it!

If you get rejected without being invited to an interview, this is unfortunate but still valuable feedback. It basically means “You clearly aren’t there yet”. So you should probably build more skills for 6 months or so before applying again.

This feels false to me. I've done a lot of CV (aka resume) screening, and it is a super noisy process, and it's easy to be overly credentialist and favour people with legible signalling. There's probably also a fair amount of noise in how well you write your CV to make the crucial information prominent. (relevant work experience, relevant publications, degrees, relevant projects, anything else impressive you've done). Further, "6 months of upskilling" may not turn out anything super legible (though it's great if it does, and this is worth aiming for!)

My MATS application has a 10 hour work task, and it's like night and day looking at the difference between how much signal I get from that and from just the CV, and I accept a lot of candidates who look mediocre on paper (and vice versa).

If you're getting desk rejected from jobs, I'd recommend asking a friend (ideally one with some experience in the relevant field/industry or who's done hiring before) to look at your CV/application to some recent jobs and give feedback.

[-]Marius Hobbhahn5mo30

Thx. updated:

"You might not be there yet" (though as Neel points out in the comments, CV screening can be a noisy process)~~“You clearly aren’t there yet”~~

[-]Neel Nanda5mo20

Thanks!

AI ALIGNMENT FORUM
AF