When I worked a FAANG research job, my experience was that it was socially punishable to bring up AI alignment research in just about any context, with exceptions as it was relevant to the team's immediate mission, for example robustness on the scale required for medical decisions (a much smaller scale than AGI ruin, but a notably larger scale, in the sense of errors being costly, than most deep learning systems in production use at the time).
I find that in some social spaces, Rationality/EA-adjacent ones in particular, it's seen as distracting, rude, and low status to emphasize a hobby horse social justice issue at the expense of whatever else is being discussed. This is straightforward when "whatever else is being discussed" is AI alignment, which the inside view privileges roughly as "more important than everything else, with vague exceptions when the mental health of high-value people who might otherwise do productive work on the topic is at stake."
On a medical research team, I took a little too long to realize that I'd implicitly bought into a shared vision of what's important. We were going to save lives! We weren't going to cure cancer–everyone falls for that trap, aiming too high. We're working on the ground, saving real people, on real timescales. Computer vision can solve the disagreement-among-experts problem in all sorts of medical classification problems, and we're here to fight that fight and win.
So you've gathered a team of AI researchers, some expert, some early-career, to finally take a powerful stab at the alignment problem. A new angle, or more funding, or the right people in the room, whatever belief of comparative advantage you have that inspires hope beyond death with dignity. And you have someone on your team who deeply cares about a complicated social issue you don't understand. Maybe this is their deepest mission, and they see this early-engineer position at your new research org as a stepping stone toward the fairness and accessibility team at Brain that's doing the real work. They do their best to contribute in the team's terms of what's valuable, and they censor themselves constantly, waiting for the right moment to make the pivotal observation that there's not a single cis woman in the room, or that the work we're doing here may be building a future that's even more hostile toward people with developmental disabilities, or this adversarial training scheme has some alarming implications when you consider that the system could learn race as a feature even if we exclude it from the dataset, or something.
I think this is a fair analogue to my situation, and I expect more broadly among people already doing AI research toward a goal other than alignment. It's
I'm being slightly unfair in implying that these are literally interactions I had with real people in the industry. This is more representative of my experiences online and in other spaces with less of a backdrop of professional courtesy. At [FAANG company] these interactions were subtler.
This story is meant to provide answers to your questions 1 and 2. As far as question 3 and making a change, I'm bullish on narratives, aesthetics, anthropology and the like as genuine interventions upstream of AI safety. We're in a social equilibrium where only certain sorts of people can move into AI safety without seriously disrupting the means by which their social needs are met. There are many wonderful people in that set, but it is relatively quite small compared to the set of people who, if they were convinced to genuinely try, could contribute meaningfully.
I would guess this doesn't appear to qualify for bonus points for being reasonably low-hanging. I come from an odd place though: personally sufficiently traumatized by my experiences in AI research that in practical terms contributing there is more or less off limits for me for the time being, yet compelled by AGI ruin narratives and experienced with substantial relevant technical background. So at least for me, this is the way forward.
Scott Aaronson recently wrote something relevant to these issues:
Max Ra: What would change your mind to explore research on the AI alignment problem? For a week? A month? A semester?
Scott: The central thing would be finding an actual potentially-answerable technical question around AI alignment, even just a small one, that piqued my interest and that I felt like I had an unusual angle on. In general, I have an absolutely terrible track record at working on topics because I abstractly feel like I “should” work on them. My entire scientific career has basically just been letting myself get nerd-sniped by one puzzle after the next.
Matt Putz: [...] do you think that money could ever motivate you to work on AI Alignment. If it was enough money? Can you imagine any amount that would make you say “okay, at this point I’ll switch, I’ll make a full-hearted effort to actually think about this for a year, I’d be crazy to do anything else”. If so, do you feel comfortable sharing that amount (even if it’s astronomically high)?
Scott: For me personally, it’s not about money. For my family, I think a mere, say, $500k could be enough for me to justify to them why I was going on leave from UT Austin for a year to work on AI alignment problems, if there were some team that actually had interesting problems to which I could contribute something.
Shmi: I’d guess that to get attention of someone like Scott, one would have to ask a question that sound like (but make more sense than) “what is the separation of complexity classes between aligned and unaligned AI in a particular well defined setup?” or “A potential isomorphism between Eliciting Latent Knowledge and termination of string rewriting” or “Calculating SmartVault action sequences with matrix permanent”
Scott: LOL, yes, that’s precisely the sort of thing it would take to get me interested, as opposed to feeling like I really ought to be interested.
There is also a question on EA Forum about the same issue: What are the coolest topics in AI safety, to a hopelessly pure mathematician?
I wonder how valuable it would be to have a high quality post or sequence on open problems in AI alignment that is substantially optimized for nerd sniping. Is it even possible to make something like this?
Extremely valuable I'd guess, but the whole problem is that alignment is still preparadigmatic. We don't actually know yet what the well-defined nerd snipe questions we should be asking are.
I think that preparadigmatic research and paradigmatic research are two different skill sets, and most Highly Impressive People in mainstream STEM are masters at the later, not the former.
I do think we're more paradigmatic than we were a year ago, and that we might transition fully some time soon. I've got a list of concrete experiments on modularity in ML systems I'd l...
I stream-of-consciousness'd this out and I'm not happy with how it turned out, but it's probably better I post this than delete it for not being polished and eloquent. Can clarify with responses in comments.
Glad you posted this and I'm also interested in hearing what others say. I've had these questions for myself in tiny bursts throughout the last few months.
When I get the chance to speak to people earlier in their career stage than myself (starting undergrad, or is a high schooler attending a mathcamp I went to) who are undecided about their careers, I bring up my interest in AI Alignment and why I think it's important, and share resources for them after the call in case they're interested in learning more about it. I don't have very many opportunities like this because I don't actively seek to identify and "recruit" them. I only bring it up by happenstance (e.g. joining a random discord server for homotopy type theory, seeing an intro by someone who went to the same mathcamp as me and is interested in cogsci, and scheduling a call to talk about my research background in cogsci and how my interests have evolved/led me to alignment over time).
I know very talented people who are around my age at MIT and from a math program I attended; students who are breezing by technical double majors with perfect GPAs, IMO participants, good competitive programmers, etc. Some things that make it hard for me:
In reality, there are things that are incredibly important/attractive for people when pursuing a career. Status, monetary compensation, and recognition (and not being labeled a nutjob) are some big ones.
This is imo the biggest factor holding back (people going into) AI safety research by a wide margin. I personally know at least one very talented engineer who would currently be working on AI safety if the pay was anywhere near what they could make working for big tech companies.
[Edit: I initially thought of this purely tongue in cheek, but maybe there is something here that is worth examining further?]
You have cognitively powerful agents (highly competent researchers) who have incentives (250k+ salaries) to do things that you don't want them to do (create AGIs that are likely unaligned), and you want them to instead do things that benefit humanity (work on alignment) instead.
It seems to me that offering $100k salaries to work for you instead is not an effective solution to this alignment problem. It relies on the agents being already aligned to the extent that a $150k/yr loss is outweighed by other incentives.
If money were not a tight constraint, it seems to me that offering $250k/yr would be worthwhile even if for no other reason than having them not work on racing to AGI.
I’ve tried to raise the topic with smart physics people I know or encounter whenever the opportunity presents itself. So far, the only ones who actually went on to take steps to try and enter alignment already had prior involvement with EA or LW.
For the others, the main reactions I got seemed to be:
I’m not a mind reader of course, so maybe their real reaction was “Quick, say something conciliatory to make this person shut up about the pet topic they are insane about.”
I think there's a bit of a social barrier to asking people with established directions to change careers (outside e.g. EA). People get invested in their current directions and may not perceive "please change careers" well even if tactfully put.
On the flip side, people who are considering changing careers are often pretty open to being told about new opportunities, and I have introduced people to AI safety who were already thinking about a change. I'm not sure that they were sold though...
A conventional approach might lead one to consider that inside the LW / AI safety bubble it borders on taboo to discount the existential threat posed by unaligned AI, but this is almost an inversion of the outside world, even if limited to to 25/75 of what LW users might consider "really impressive people."
This is one gateway to one collection of problems associated with spreading awareness of AI alignment, but let's go in a different direction: somewhere more personal.
Fundamentally, it seems a mistake to frame alignment as an AI issue. While unaligned AGI appears to be rapidly approaching and we have good reasons to believe this will probably result in the extinction of our species, there is another, more important alignment problem that underlies, and somewhat parallels the AI alignment problem. Of course, this larger issue is the alignment problem as faced by humanity at large.
Humans are famously unaligned on many levels: with respect to the self, interpersonally, and micro / macro-socially. No good solution to any tier of this problem has been discovered over thousands of years of inquiry. In the 20th century, humans developed technology useful for acquiring a great deal of information about the universe beyond our world, and "coincidentally" our capability of concentrated destruction increased in effectiveness by orders of magnitude, to the scale where killing at least large portions of the species in a short time is plausible. Thus, the question of why we don't see others like us even though there appears to be ample space tended to find answers along the lines of intelligent life destroying itself. Of course, this is the result of an alignment "problem."
Dull humans forecasted that nuclear arms would end the world and slightly smarter humans suggested that we might wait for antimatter, nanotech, genetically engineered pathogens or some other high-impact dangerous technology. As we're seeing now, these problems are difficult. What appears to be less difficult is AGI.
So, even though it's not in the interest of the continuity of the species, humanity can't help but to race redundantly at breakneck pace toward this new technological capability, embodying a slightly disguised, concentrated and lethal version of one of the oldest and most fundamental problems our species has ever faced. That AI alignment is not taken more seriously could be seen as a reflection of "really impressive people" actually not having paid much mind to the alignment problems embedded in and endemic to who we are.
Should one introduce really impressive people to AI alignment? Maybe, but one must remember that magic appears unavailable and that for various reasons, it is predictably the case that most people, even "really impressive" people, will not consider the problem to be more than an abstract curiosity with even the best presentation. So to evangelize about AI alignment seems most useful as a fulfillment of one's personal / social interests rather than much of a useful tool to increase work to save the species.
Full disclosure: it's not clear that alignment is a meaningful concept, it's not clear that humans have meaningful or consistent values, it's very much not clear that continuing the human species is a good thing (at any point in our history, past, present or future) from an S-risk perspective, and it's not clear that humans have any business rationally evaluating the utility in survival and reproduction as these are goals we're apparently optimized for. So it should be the case that this post is written with less motivation to evangelize.
I have a few questions to the subset of readers who:
I would love to hear your thoughts on some of the following questions: