Evaluations (of new AI Safety researchers) can be noisy

LawrenceC

TL;DR: Evaluating whether or not someone will do well at a job is hard, and evaluating whether or not someone has the potential to be a great AI safety researcher is even harder. This applies to evaluations from other people (e.g. job interviews, first impressions at conferences) but especially to self-evaluations. Performance is also often idiosyncratic: people who do poorly in one role may do well in others, even superficially similar ones. As a result, I think people should not take rejections or low self confidence so seriously, and instead try more things and be more ambitious in general.

Epistemic status: This is another experiment in writing fast as opposed to carefully. (Total time spent: ~4 hours.) Please don’t injure yourself using this advice.^[1]

Introduction: evaluating skill is hard, and most evaluations are done via proxies

I think people in the LessWrong/Alignment Forum space tend to take negative or null evaluations of themselves too seriously.^[2] For example, I’ve spoken to a few people who gave up on AI Safety after being rejected from SERI MATS and REMIX; I’ve also spoken to far too many people who are too scared to apply for any position in technical research after having a single negative interaction with a top researcher at a conference. While I think people should be free to give up whenever they want, my guess is that most people internalize negative evaluations too much, and would do better if they did less fretting and more touching reality.

Fundamentally, this is because evaluations of new researchers are noisier than you think. Interviews and applications are not always indicative of the applicant’s current skill. First impressions, even from top researchers, do not always reflect reality. People can perform significantly differently in different work environments, so failing at a single job does not mean that you are incompetent. Most importantly, people can and do improve over time with effort.

In my experience, a lot of updating so hard on negative examples comes from something like anxious underconfidence as opposed to reasoned arguments. It’s always tempting to confirm your own negative evaluations of yourself. And if you’re looking for reasons why you’re not “good enough” in order to handicap yourself, being convinced that one particular negative evaluation is not the end of the world will just make you overupdate on a different negative evaluation. Accordingly, I think it’s important to take things a little less seriously, be willing to try more things, and let your emotions more accurately reflect your situation.

Of course, that’s not to say that you should respond to any negative sign by pushing yourself even harder; it’s okay to take time to recover when things don’t go well. But I strongly believe that people in the community give up a bit too easily, and are a bit too scared to apply to jobs and opportunities. In some cases, people give up even before the first external negative evaluation: they simply evaluate themselves negatively in their head, and then give up. Instead of doing this, you should try your best and put yourself out there, and let reality be the judge.

My personal experience

I’m always pretty hesitant to use myself as an example, both because I’m not sure I’m “good enough” to qualify, and also because I think people should aspire to do better than I have. That being said, in my case:

I’m currently a researcher at the Alignment Research Center’s Evaluations team, was previously at Redwood Research and a PhD student at CHAI, have received offers from other AI labs, on the board of FAR, and have been involved in 5+ papers I’m pretty proud of in the past year.

In the past, I’ve had a bunch of negative signs and setbacks:

I did not have a math background when I first learned about AI Safety in 2014 as a high school senior/college freshman. I did not know how to code when I started my undergrad. I had deliberately avoided STEM subjects in high school: I did not do any extracurriculars involving math or physics, and definitely did not participate in (let alone win) any olympiads.
At my first CFAR Workshop in 2015, I was told by the organizer that I was probably not good enough at math to contribute to AI safety.
I applied twice for OpenAI internships, in 2017 and 2018, and was rejected without an interview both times.
The first draft of my first paper was pretty mediocre, and got destroyed by reviewers when we submitted it to ICLR 2018.
I was quite depressed during 2020-2021 due to a combination of COVID and not enjoying my research at CHAI. As a result, I went on a leave of absence in Jan 2022 to work at Redwood Research.

I also don’t think my case (or Beth’s case below) was particularly unusual; several other AI safety researchers have had similar experiences. So empirically, it’s definitely not the case that a few negative evaluations mean that you cannot ever become an AI safety researcher.

Why exactly are common evaluations so noisy?

Previously, I mentioned three common evaluation methods—-interviews/job applications, first impressions from senior researchers, and jobs/work trial tasks—and claimed that they tend to be noisy. Here, I’ll expand on why each evaluation method can be noisy in detail, even in cases where all parties are acting in good faith.

This section is pretty long and rambly; feel free to skip to the next header if you feel like you’ve got the point already.

Bootcamp/Funding/Job Applications

By far the most common negative evaluation that most people receive is being rejected from a job or bootcamp, or having a funding application denied. While this is pretty disheartening, there’s a few reasons why a rejection may not be as informative as you might expect:

Most applications just don’t contain that much information. Fundamentally, there just isn’t a lot to go on when it comes to any of these applications. In some cases, all that the application reviewers have to go on is a short application form and your CV or resume. In other cases, you might get a single 30 minute or 1 hour interview. Even in the most informative cases, such as a few hour work trial task or an onsite interview day, the interviewer will probably only be evaluating you on a few hours of content in total. At the same time, reviewers just don’t spend that much time reading applications – for example, reviewers for the first round of UC Berkeley’s AI PhD program are expected to spend only around 10 minutes per application. This just isn’t enough information to perfectly capture the current skill of the applicant, let alone their future potential.
Not all relevant skills show up on paper; not everything that shows up on paper is relevant. Success at research involves many soft or hard-to-document skills that don’t show up on CVs or application forms by default, and that may be hard to elicit in interviews. For example, it’s hard to evaluate the research taste of new researchers, as almost everyone starts out with poorly thought out research ideas, and gets better with practice! Similarly, project management skills, motivation to do research, and executive function can also be hard to evaluate. On the other hand, many of the proxies that are salient to reviewers (such as educational background, grades, number of publications, etc.) are relatively weak indicators of research success.
Performance in interviews can vary greatly with time or talent. Different people have different levels of “interview skill”. As a result, many people may appear significantly better or worse than their actual skill level. How well people do on interviews also tends to vary greatly by time of day. For example, I am not a morning person, and perform significantly worse when interviews are scheduled for 7 or 8 AM, while I know others who do worse when interviews are scheduled later during the day. And sometimes, you’re just nervous and do poorly.
There’s often an element of luck when it comes to preparation. When it comes to coding interviews in particular, there’s a lot of variance depending on your experience – for example, it’s possible that you’ve just solved a very similar problem to the coding problem given to you, which means you’ll do significantly better than you would otherwise. On the other hand, it’s possible that you encounter a wholly unknown problem, and perform poorly as a result. It’s also possible that you just happened to focus your application on a project that the application reviewer was particularly impressed by, or that you focused your application on a project they disliked.
Sometimes the rejection isn’t about you. In a few cases, there just isn’t the headcount or funding for the thing you’re applying to, or they were looking for other particular skills, or there were just a few extremely qualified candidates in your particular batch. I’ve even heard of cases where a clerical error led to an accidental immediate rejection of a very qualified candidate that was later hired for the same job.^[3] Even in cases where this isn’t the deciding factor, it’s rarely the case that such prosaic considerations are never a factor.

At the end of the day, not every denied application will come with a clearly denominated reason. I’d strongly recommend against immediately slapping on “the reason is because I’m bad” to every rejection.

First impressions at parties/conferences/workshops

Insofar as applications don’t accurately reflect your skill or abilities, first impressions in social settings such as parties, conferences, and workshops are even worse.

There’s even less information in a first impression. Just meeting someone for the first time really does not give you that much information about that person. Yes, it’s true that you can pick up on many factors in a single interaction, but the total volume of information is still not that high; there’s a reason for old adages in the style of “don’t judge a book by its cover”. A single, few-minutes long interaction just does not contain that much information, and does not let someone evaluate you with perfect accuracy.
Appearance, stereotypes, and reputation greatly confound initial evaluations. There are many factors beyond aptitude for research that affect people’s initial judgments of others. It’s possible that the reason you were treated less positively is due to factors such as appearance, stereotypes, or even reputation/status. I don’t think that this is necessarily the fault of the evaluators, but instead just an unavoidable part of the human experience. I also don’t think these factors are anything close to insurmountable, but they do affect initial judgments.
Charisma or extraversion don’t make you a great researcher by themselves. A lot of leaving a positive first impression on other people is to demonstrate your knowledge or experience while engaging deeply in the conversation. Unfortunately, this is much harder to do if you’re introverted or socially awkward! Fortunately, neither extraversion nor large amounts of social grace are necessary for research success; many top AI researchers are quite introverted and/or socially awkward.
Preparation matters more than you might think. Presenting your research ideas to other people is a famously challenging task. It takes a lot of effort to get research ideas into a form where it’s possible to get meaningful feedback at all, and different ways of framing the same idea appeal to different people! As a result, it’s very possible that your idea was shot down due to communication errors and not due to it being fundamentally flawed or unworkable.
Really, sometimes it’s not about you. As previously mentioned, many researchers are quite introverted and/or socially awkward. Most senior researchers also have many demands on their time. Sometimes, the negative interaction you had with them is merely due to their social awkwardness or due to them being busy or distracted. Maybe they had a bad day! It’s also worth noting that people often resort to base rates when pressed; them saying that you probably won’t be a great researcher is probably more a statement about base rates than a statement about you in particular.

Yes, having negative social interactions always sucks. But a few negative interactions, even with famous or senior researchers, is not a particularly strong sign that you’re not cut out to be an AI researcher.

Job Performance

It’s definitely true that poor job performance at a research-y job (or even a long work trial) is more of a signal than a rejection or a negative first impression. That being said, I don’t think it’s necessarily that strong of a signal, for the following reasons:

Different jobs require different skills. There are many research-related jobs out there, all of which require different skills. Even superficially similar jobs can require significantly different skill sets. For example, the skills needed to be a successful research engineer (RE) at Anthropic will differ from the skills needed to be a successful RE at Redwood, let alone an RE at an academic lab or ARC Evals.
There are many factors outside of aptitude that determine job performance. Factors such as adverse life circumstances, interpersonal drama with coworkers or managers, personal fit, or different team cultures or management styles can greatly impact how well you do at your job. However, these factors can often be changed, either with time or by changing to a different job.
People can get better at things. It turns out that, in fact, people can and do upskill over time. I’ve known a few people who were able to improve their executive function over time, and many people who went from “not a lot of math/cs background” to “able to do novel interesting research” over the course of a few years, primarily via self-study. I know even more who greatly improved their technical skills on the job. So even if you don’t have the skills to succeed at a particular job now, you might be fully qualified for the job in the future!
Seriously, it’s really sometimes not about you. Employees get fired or let go for many reasons. For example, the organization could be having financial concerns, trying to reduce management load, or be trying to pivot to another style of research. Alternatively sometimes you just don’t get along well with your boss or coworkers!

In my case, I think all four of the reasons applied to some extent for the last two years of my PhD: my skills were not super suited to academia, I was depressed in part due to COVID, I had significantly worse executive function, and I don’t think I enjoyed the academic culture at Berkeley very much. Again, while being let go from a job (or leaving due to poor performance) is definitely a negative sign, I think it’s nowhere near fatal for one’s research ambitions in itself.

Yes, this includes your evaluations as well.

In practice, people seem more hampered by their own self-assessments, more so than any external negative evaluations. I think a significant fraction of people I’ve met in this community have suffered from some form or another of imposter syndrome. I’ve also consistently been surprised by how often people fail to apply for jobs they’re clearly qualified for, and that would like to hire them.

It’s certainly true that you have significantly more insight into yourself than any external evaluator. Empirically, I think that new researchers tend to be pretty poorly calibrated about how well they’d do in research later on, often underperforming even simple outside view heuristics.

Why might self-assessments also suffer from significant noise?

Being good at things does not always feel like being good at things. I often hear statements along the lines of “yes, I can do X really well, but only because it’s easy!” or “I only do well because I cheat by working a lot more/having a good memory/being charismatic/etc”. If you’re good enough at something, it tends to feel easy: in fact, looking for places where other people seem mysteriously bad at easy tasks is a good way of identifying your strengths. Similarly, other people are also doing well due to their unique strengths, which you probably don’t know about.
You don’t know how other people feel from the inside. The vast majority of people feel frustrated or stupid on particular problems they can’t solve. Empirically, I think that a majority of people have had some form or another of negative self-assessment or even full on imposter syndrome. Many great researchers I know have had periods where they felt bad about themselves. So you’re probably overestimating how strongly you should be updating on your own gut feelings or experiences.
You see more of your failings than you do others. Specifically, you might be overestimating how well other people are doing, since people generally advertise their successes and don’t advertise their failures. On the other hand, you’re probably fully aware of all of your failures. This can give you a very skewed view into how well you’re doing relative to your peers.
You might not be fully aware of what particular positions actually want. Different research teams want different skills, and different jobs hold different bars for hiring. It’s often hard to know ahead of time exactly what they’re looking for, or how willing they are to try out less experienced people, especially if you’re not super familiar with the organization in question.
Your self-assessments may be significantly clouded by external factors. There’s an incredibly long list of external factors that affect your self assessment in either direction. For example, your mood probably greatly affects your self assessments – being relatively manic causes people to greatly overestimate their own skill or fit, while being depressive often causes people to underestimate their own skills or fit. More prosaically, even physical discomfort (the classic example of which is being hungry) can cause you to feel bad about your own prospects.

Of course, I think people should aspire to have good models of themselves. But especially if you’re just starting out as a researcher, my guess is your model of your own abilities is probably relatively bad, and I would not update too much off of your self-assessments.

On anxious underconfidence and self-handicapping

More speculatively, I think the tendency for people to over update on noisy negative evaluations is caused in large part due to a combination of anxiety and a desire to self-handicap. AI safety research is often quite difficult, and it’s understandable to feel scared or underconfident when starting your research journey.^[4] And if you believe that such research is important and also feel daunted about whether or not you can contribute at all, it can be tempting to avoid touching reality or even self-handicapping to get an excuse for failure. After all, if your expectations are sufficiently low, you won’t ever be disappointed.

I don’t think this dynamic happens at a conscious level for most people. Instead, my guess is that most people develop it due to status regulation or due to small flinches from uncomfortable events. That being said, I do think it’s worth consciously pushing back against this!

What does this mean you should do?

You should touch reality as soon as possible, and try to get evidence on the precise concern or question you have. Instead of worrying about whether or not you can do something, or trying to extract the most out of the few bits of evidence you have, go gather more evidence! Try to learn the skills you think you don’t have, try to apply for some jobs or programs you think definitely won’t take you, and try to do the research you think you can’t do.

I also find that I spend way more time encouraging people to be more ambitious than the other way around. So on average, I’d probably also recommend trying hard on the project that interests you, and being more willing to take risks with your career.

That being said, I want to end this piece by reiterating the law of equal and opposite advice. While I suspect the majority of people should push themselves a bit harder to do ambitious things, this advice is precisely the opposite of what many people need to hear. There are many other valuable things you could be doing. If you’re currently doing an impactful job that you really enjoy, you should probably stick to it. And if you find that you’re already pushing yourself quite hard, and additional effort in this direction will hurt you, please stop. It’s okay to take it easy. It’s okay to rest. It’s okay to do what you need to do to be happy. Please don’t injure yourself using this advice.^[5]

Acknowledgments

Thanks to Beth Barnes for inspiring this post and contributing her experiences in the appendix, and to Adrià Garriga-Alonso, Erik Jenner, Rachel Freedman, and Adam Gleave for feedback.

Appendix: testimonials from other researchers

After writing the post, several other researchers reached out with additional evidence that they've given me evidence to post:

Addendum from Beth Barnes

Soon after writing the post, Beth Barnes reached out and gave me permission to post about her experiences:

I feel like I have a lot of examples of getting negative signals:
I [Beth] found undergraduate CS felt very hard and I was quite depressed especially in the second year of university. I felt like I wasn't understanding much of the content, and was barely scraping by.
I did a research internship that was highly unsuccessful - I had no idea how my supervisor's code worked and I spent most of the summer stuck on what I thought was an algorithmic problem but was actually a dumb bug I'd introduced at the beginning. After this I concluded I wasn't a good fit for technical research.
I felt like I 'didn't actually know how to code' and was actually not very smart and a total impostor, to the extent that I almost had a panic attack when a friend gave me a mock coding interview
I never even got to interview stage with any big tech company internships I applied to
I had a fixed-term role with an AI lab that I was hoping to extend or turn into a permanent position, but they decided not to continue the role and instead offered me a very junior operations assistant role.
There was discussion of firing me at an AI lab because people weren't excited about my work.
There were two incidents that I consider quite close to being fired, in that a manager had the choice to continue working with me or not, and chose not to.
Despite that,
I currently run the evaluations project at ARC, which various people I respect think is pretty promising.
I've also produced some more standard technical alignment work I'm somewhat happy with.
In the past I was concerned that Paul had been saddled with me (after my previous manager left) and I was wasting his time, but he chose to hire me to ARC in the end.
I feel much better about my ability to code, mostly based on two key moments:
Realizing that trying to use high-level libraries you don't understand makes things much harder to debug, and it's much better for learning and overall faster to work with simple tools you understand well, even if that means writing significantly more code. Recognizing when I'm in a mode of 'randomly changing things I don't understand and hoping it will work', and trying to avoid that as much as possible.
Pair coding during MLAB (after again almost having a panic attack doing the coding test, and probably failing to meet the standard admission threshold on the test) and realizing that I wasn't actually that slow compared to various other people who were certified Good At Coding And ML (TM)
As a manager now, I've had to make various decisions about hiring, with different levels of involvement from skimming CVs to extended work trials. I've felt very uncertain in most cases. In particular, even with extended work trials, there's a lot of uncertainty because:
People have different starting skills/knowledge, but usually what we're actually interested in is growth rate, which is even harder to assess
Various people I chose not to continue with had significantly better technical skills than (I think) I did at their age, which feels confusing
In various cases it felt like how well different people were doing was quite heavily influenced by extraneous factors, like whether they were working from the office, and how much energy and attention I had put into managing them. Ideally I would trial everyone in their optimal circumstances, after putting a decent amount of effort into thinking about what exactly they needed from me to maximally grow and flourish. But given limited resources this is often not what trials looked like.
I can also confirm direct knowledge of at least one case of a good candidate who was ultimately hired getting rejected for totally spurious clerical reasons. I wouldn't be surprised if this has happened various times without anyone even finding out.

Addendum from Scott Emmons

Scott Emmons, a PhD student at UC Berkeley's CHAI, gave me permission to share the following:

I'm happy with you mentioning the example of my getting rejected from the CHAI internship!
I was also rejected from the final round of Jane Street's trading internship interview process. I'm happy for you to mention that too if you think it's relevant.
My perspective on both these rejections is that I don't shine in on-the-spot problem solving interviews

Addendum from anonymous senior AGI safety researcher

Finally, a senior AGI safety researcher (who wishes to remain anonymous) sent me the following:

I listed people who I had had meetings with before 2021, and the meeting was at a time when they were either junior or new to the field (usually both, note I might also have been junior at the time). I then guessed how promising I thought they were at the time, and then said how promising I thought they were now (often this involved some Googling to figure out what they had done in the time since our meeting). I’ll focus here on the n=60 subgroup of “junior people already motivated by AI safety when I talked to them”.

For “promise now minus promise at time of meeting”, the mean is 0.05 and the stddev is 1.37 (on a 10-point scale where in practice most of my numbers were in the 5-8 range). So overall my initial impressions seem calibrated but non-trivially noisy. (Though I don’t take this too seriously since I’m guessing “promise at time of meeting” retroactively.)
The people who I rated highest by promise during an initial meeting usually stayed promising or decreased slightly (this is just optimizer’s curse). For those who stayed the same, the level of promise here is “more likely than not they will be hired by an existing AI safety org (including OpenAI + DeepMind) or achieve something similarly good” but not “probably a top-tier researcher”.
[Re-reading this now, I suspect that I’ve raised my estimate for the bar for getting hired by an AI safety org, and so the level of promise is actually lower than “more likely than not to get hired by an AI safety org”.]
The people who I rated low on promise had much more variance, though tended to increase in promise (again, optimizer’s curse / regression to the mean).
The two top people according to “promise now” had scores of “somewhat above average” and “below average” for “promise at time of meeting”. In general it seems like I’m pretty bad at identifying great people from a single meeting when they are junior.

^{^}
I think this probably also applies in general, but I’m much less sure than in the case of AI research. As always, the law of equal and opposite advice applies. It’s okay to take it easy, and to do what you need to do to recover. I also don’t think that everyone should aim to be an AI safety researcher – my focus is on this field because it’s what I’m most familiar with. If you’ve found something else you’re good at, you probably should keep doing it.
^{^}
I also think there’s a separate problem, where people take positive evaluations of their peers way too seriously. E.g. people seem to noticeably change in attitude if you mention you’ve worked with a high status person at some point in your life. I claim that this is also very bad, but it’s not the focus of the post.
^{^}
This also happens to a comical extent with papers at conferences. E.g. Neel Nanda's grokking work was rejected twice from arXiv (!) but an updated version got a spotlight at ICLR. Redwood's adversarial training paper got a 3, a 5, and a 9 for its initial reviews. In fact, I know of several papers that got orals at conferences, that were rejected entirely from the previous conference.
^{^}
I also feel like this is exacerbated by several social dynamics in the Bay Area, which I might eventually write a post about.
^{^}
If there’s significant interest or if I feel like people are taking this advice too far, I’ll write a followup post giving the opposite advice.

[-]Buck Shlegeris2y1312

I agree with a lot of this post.

Relatedly: in my experience, junior people wildly overestimate the extent to which senior people form confident and sticky negative evaluations of them. I basically never form a confident negative impression of someone's competence from a single interaction with them, and I place pretty substantial probability on people changing substantially over the course of a year or two.

I think that many people perform very differently in different job situations. When someone performs poorly in a job, I usually only update mildly against them performing well in a different role.

[-]Ansh Radhakrishnan2y53

Thanks for this post Lawrence! I agree with it substantially, perhaps entirely.

One other thing that I thing interacts with the difficulty of evaluation in some ways is the fact that many AI safety researchers think that most of the work done by some other researchers is approximately useless, or even net-negative in terms of reducing existential risk. I think it's pretty easy to wrap an evaluation of a research direction or agenda and an evaluation of a particular researcher together. I think this is actually pretty justified for more senior researchers, since presumably an important skill is "research taste", but I think it's also important to acknowledge that this is pretty subjective and that there's substantial disagreement about the utility of different research directions among senior safety researchers. It seems probably good to try and disentangle this when evaluating junior researchers, as much as is possible, and instead try to focus on "core competencies" that are likely to be valuable across a wide range of safety research directions, though even then the evaluation of this can be difficult and noisy, as the OP argues.

[-]Neel Nanda2y32

I appreciate this post, and vibe a lot!

Different jobs require different skills.

Very strongly agreed, I did 3 different AI Safety internships in different areas, where I think I was fairly mediocre in each, before I found that mech interp was a good fit.

Also strongly agreed on the self-evaluation point, I'm still not sure I really internally believe that I'm good at mech interp, despite having pretty solid confirmation from my research output at this point - I can't really imagine having it before completing my first real project!

[-]Ben Pace2y20

As part of my work at Lightcone I manage an office space with an application for visiting or becoming a member, and indeed many of these points commonly apply to rejection emails I send to people, especially "Most applications just don’t contain that much information" and "Not all relevant skills show up on paper".

I try to include some similar things to the post in the rejection emails we send. In case it's of interest or you have any thoughts, here's the standard paragraph that I include:

Our application process is fairly lightweight and so I don't think a no is a strong judgment about a person's work. If you end up in the future working on new projects that you think are a good fit for Lightcone Offices, you're welcome to apply again. Also if you're ever collaborating on a project with a member of the Lightcone Offices, you can visit with them to work together. Good luck in finding traction on improving the trajectory of human civilization.