I talked to Connor Leahy about Yudkowsky's antimemes in Death with Dignity, common misconceptions about EleutherAI and his new AI Alignment company Conjecture.

Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find an accompanying transcript, organized in 74 sub-sections.

Understanding Eliezer Yudkowsky

Eliezer Has Been Conveying Antimemes

“Antimemes are completely real. There's nothing supernatural about it. Most antimemes are just things that are boring. So things that are extraordinarily boring are antimemes because they, by their nature, resist you remembering them. And there's also a lot of antimemes in various kinds of sociological and psychological literature. A lot of psychology literature, especially early psychology literature, which is often very wrong to be clear. Psychoanalysis is just wrong about almost everything. But the writing style, the kind of thing these people I think are trying to do is they have some insight, which is an antimeme. And if you just tell someone an antimeme, it'll just bounce off them. That's the nature of an antimeme. So to convey an antimeme to people, you have to be very circuitous, often through fables, through stories you have, through vibes. This is a common thing.
Moral intuitions are often antimemes. Things about various human nature or truth about yourself. Psychologists, don't tell you, "Oh, you're fucked up, bro. Do this." That doesn't work because it's an antimeme. People have protection, they have ego. You have all these mechanisms that will resist you learning certain things. Humans are very good at resisting learning things that make themselves look bad. So things that hurt your own ego are generally antimemes. So I think a lot of what Eliezer does and a lot of his value as a thinker is that he is able, through however the hell his brain works, to notice and comprehend a lot of antimemes that are very hard for other people to understand.”

Why the Dying with Dignity Heuristic is Useful

“The whole point of the post is that if you do that, and you also fail the test by thinking that blowing TSMC is a good idea, you are not smart enough to do this. Don't do it. If you're smart enough, you figured out that this is not a good idea... Okay, maybe. But most people, or at least many people, are not smart enough to be consequentialists. So if you actually want to save the world, you actually want to save the world... If you want to win, you don't want to just look good or feel good about yourself, you actually want to win, maybe just think about dying with dignity instead. Because even though you, in your mind, don't model your goal as winning the world, the action that is generated by the heuristic will reliably be better at actually saving the world.”

“There's another interpretation of this, which I think might be better where you can model people like AI_WAIFU as modeling timelines where we don't win with literally zero value. That there is zero value whatsoever in timelines where we don't win. And Eliezer, or people like me, are saying, 'Actually, we should value them in proportion to how close to winning we got'. Because that is more healthy... It's reward shaping! We should give ourselves partial reward for getting partially the way. He says that in the post, how we should give ourselves dignity points in proportion to how close we get.
And this is, in my opinion, a much psychologically healthier way to actually deal with the problem. This is how I reason about the problem. I expect to die. I expect this not to work out. But hell, I'm going to give it a good shot and I'm going to have a great time along the way. I'm going to spend time with great people. I'm going to spend time with my friends. We're going to work on some really great problems. And if it doesn't work out, it doesn't work out. But hell, we're going to die with some dignity. We're going to go down swinging.”

"If you have to solve an actually hard problem in the actual real world, in actual physics, for real, an actual problem, that is actually hard, you can't afford to throw your epistemics out the door because you feel bad. And if people do this, they come up with shit like, 'Let's blow up to TSMC'. Because they throw their epistemics out the window and like, 'This feels like something. Something must be done and this is something, so therefore it must be done'."

EleutherAI

Why training GPT-3 Size Models made sense

“Well, I remember having these conversations with some people in the alignment sphere, where they're like, "Oh well, why did you build the models? Just use GPT-2, that's fine." I'm like, "Well, okay, what if I want to see the bigger properties?" And they'll be like, "They'll probably exist in the smaller models too or something. Name three experiments you're going to do with this exact model." And I'm like, "I could come up with three, sure. But that's kind of missing the point." The point is: we should just really stare at these things really fucking hard. And turns out, in my experience, that was a really good idea. Most of my knowledge, my competitive advantage is gained from that period of just actually building the things, actually staring at them really hard and not just knowing about the OpenAI API existing and reading the papers. There's a lot of knowledge you can get from reading a handbook, but actually running the machine will teach you a lot of things.”

EleutherAI Spread Alignment Memes in the ML World

"One of the important parts of my threat model is that I think 99% of the damage from GPT-3 was done the moment the paper was published. And, as they say about the nuclear bomb, the only secret was that it was possible. And I think there's a bit of naivety that sometimes goes into these arguments, where people are, 'Well, EleutherAI accelerated things, they drew attention to the meme'. And I think there's a lot of hindsight bias there, in that people don't realize how everyone knew about this, except the alignment community. Everyone at OpenAI, Google Brain and DeepMind. People knew about this, and they figured it out fucking fast."

"One of the things that EleutherAI did, and this was very much intentional, is that it created a space that is open to the wider ML community and their norms. It is respectful of AI researchers and their norms. And we also have street cred, in the sense that we are ML researchers and we're not just some dude talking about logical induction or whatever, but still has a very strong alignment meme. Alignment is high status. It is a respectful thing to talk about, a thing to take seriously. It is not some weird thing some people in Berkeley think about. It is a serious topic of serious intrigue. And for what it's worth, of the five core people at EleutherAI that changed their job as a direct consequence of EleutherAI, four went into alignment."

"I'm not saying, was it a resounding success? Did it do everything I wanted? No. It could always have been better. But I like to believe that there was a positive magnetic contagion that happened there. As I say, a lot of people that I know, that were an ML, started taking alignment seriously. I know several professors at several universities that'd gone to EleutherAI through the scaling memes, and then became convinced that this alignment thing seems important potentially."

On the Policy and Impact of EleutherAI's Open Source

"Our official position, which you can read in our blog, which has always been there, is that not everything should be released. And in fact, we, EleutherAI, discovered at least two capabilities advancements ahead of anyone else in the world, and we successfully kept them secret, because we were like "Oh shit". One is the chain of thought prompting idea, which we then later published. I believe I showed Eliezer the pre-draft. So he may be able to confirm that I'm not bullshitting you on this. I think it was Eliezer that I showed that to. And so in that regard, I fully understand why people think this, because that's a default open-source thing. And there're several other open-source groups now, that have split off from Eleuther or they're distant cousins of Eleuther, that do think this way. I strongly disagree with them. And I think that what they're making is not a good idea. It was always contingent. EleutherAI's policy was always "we think this specific thing should be open". Not all things should be open, but this specific thing that we are thinking about right now, that we're talking about right now, this specific thing we think should be open for this, this, this and this reason. But there are other things which we may or may not encounter, which shouldn't be open. We made very clear if we ever had a quadrillion parameter model for some reason, we would not release it."

"Again, I want to be very clear here. It may have been a mistake to release GPT-J. It may have been a mistake. I don't think it is one, for various contingent reasons, but I'm not ideologically committed to the idea that this was definitely the right thing to do. I think given the evidence that I've seen, for example, GPT-J being used in some of my favorite interpretability papers, such as the Editing and Eliciting Knowledge paper from David Bau's lab, which is an excellent paper, and you really should read. And several other groups such as Redwood using GPT-Neo models in their research and such. I think that there are a lot of reasons why this was helpful to some people, this was good. Also, the tacit knowledge that we've gained has been very instrumental for setting up Conjecture and what I do now. So I think there are reasons why it was good, but I could be wrong about this. Again, if people disagree with me about that, I think I disagree, but I think that it's not insane."

Conjecture

How Conjecture Started

"So Conjecture grew a lot out of some of the bottlenecks I found while working in EleutherAI. So EleutherAI was great. I love the people there and such. Anyway, we had a lot of great people and such. But if you wanted to get something done, it was like herding cats. But imagine the cats also have crippling ADHD and are the smartest people you've ever met. Especially if anything boring needed to get done, if we needed to fix some bugs or scrape some data or whatever, it would very often just not get done. Because it was all volunteer based, right? You wanted to do fun things. It's your free time. People don't want to do boring shit. During the pandemic it was a bit different, because people literally didn't have anything really to do. But now you have a social life again, you have a job. And then you don't want to come home and spend two hours debugging some goddamn race condition or whatever."

"So, the idea was first floated very early in EleutherAI, but I put that completely on ice. I didn't want to do that. I wanted to just focus on open-source and such. So it became really concrete around late 2021, September-October I think, when Nat Friedman, who was the CEO of GitHub at the time, approaches EleutherAI and says, 'Hey, I love what you guys are doing. It's super awesome. Can help you with anything? You want to meet up sometime?'. And, to add to his credit, he donated a bunch of money to help EleutherAI to keep going. A man of his word. And he happened to be in Germany at the time, which was where I was as well. And he was, 'Hey, do you want to meet up for a coffee?' And so we met up, really got along, and he was, 'Hey, you ever thought of doing a company or something?' 'Now, I have been thinking about that.' 'Why don't you just come by the Bay sometime and talk' and such. And so I was thinking, 'Oh cool, I can go to the Bay and I can...' So it was a confluence of factors, right? It was an excuse to go to the Bay to talk to both Nat and his friends, but also talk to Open Phil and potential EA funders and stuff like that. And also, I was getting on EleutherAI, I was hitting those bottlenecks I was talking about, where I was trying to do research on EleutherAI but it just wasn't working."

Where Conjecture Fits in the AI Alignment Landscape

"Conjecture differs from many other orgs in the field by various axes. So one of the things is that we take short timelines very seriously. There's a lot of people here and there that definitely entertain the possibility of short timelines or think it's serious or something. But no real org that is fully committed to five year timelines, and act accordingly. And we are an org that takes this completely seriously. Even if we just have 30% on it happening, that is enough in our opinion, to be completely action relevant. Just because there are a lot of things you need to do if this is true, compared to 15-year timelines, that no one's doing, that it seems it's worth trying. So we have very short timelines. We think alignment is very hard. So the thing where we disagree with a lot of other orgs, is we expect alignment to be hard, the kind of problem that just doesn't get solved by default. That doesn't mean it's not solvable. So where I disagree with Eliezer is that, I do think it is solvable... he also thinks it's solvable. He just doesn't think it's solvable in time, which I do mostly agree on. So I think if we had a hundred years time, we would totally solve this. This is a problem that can be solved, but doing it in five years with almost no one working on it, and also we can't do any tests with it because if we did a test, and it blows up, it's already too late, et cetera, et cetera... There's a lot of things that make the problem hard."

"One of the positive things that I've found is just, no matter where I go, the people working in the AGI space specifically are overwhelmingly very reasonable people. I may disagree with them, I think they might be really wrong about various things, but they're not insane evil people, right? They have different models of how reality works from me, and they're like... You know, Sam Altman replies to my DMs on Twitter, right? [...] I very strongly disagree with many of his opinions, but the fact that I can talk to him is not something we should have taken for a given. This is not the case in many other industries, and there's many scenarios where this could go away, and we don't have this thing that everyone in the space knows each other, or can call each other even. So I may not be able to convince Sam of my point of view. The fact I can talk to him at all is a really positive sign, and a sign that I would not have predicted two years ago."

Why Conjecture is Doing Interpretability Research

"I think it's really hard for modern people to put themselves into an epistemic state of just how it was to be a pre-scientific person, and just how confusing the world actually looked. And now even things that we think of as simple, how confusing they are before you actually see the solution. So I think it is possible, not guaranteed or even likely, but it's possible, that such discoveries could not be far down the tech tree, and that if we just come at things from the right direction, we try really hard, we try new things, that we would just stumble upon something where we're just like, 'Oh, this is okay, this works. This is a frame that makes sense. This deconfuses the problem. We're not so horribly confused about everything all the time.'"

Conjecture Approach To Solving Alignment

"If you need to roll high, roll many dice. At Conjecture, the ultimate goal is to make a lot of alignment research happen, to scale alignment research, to scale horizontally, to tile research teams efficiently, to take in capital and convert that into efficient teams with good engineers, good op support, access to computers, et cetera, et cetera, trying different things from different direction, more decorrelated bets."

"To optimize the actual economy is just computationally impossible. You would have to simulate every single agent, every single thing, every interaction, just impossible. So instead what they do is, they identify a small number of constraints that, if these are enforced, successfully shrink the dimension of optimization down to become feasible to optimize within. [...] If you want to reason about how much food will my field produce, monoculture is a really good constraint. By constraining it by force to only be growing, say, one plant, you simplify the optimization problem sufficiently that you can reason about it. I expect solutions to alignment, or, at least the first attempts we have at it, to look kind of similar like this. It'll find some properties. It may be myopia or something, that, if enforced, if constrained, we will have proofs or reasons to believe that neural networks will never do X, Y, and Z. So maybe we'll say, 'If networks are myopic and have this property and never see this in the training data, then because of all this reasoning, they will never be deceptive.' Something like that. Not literally that, but something of that form."

"There is this meme, which is luckily not as popular as it used to be, but there used to be a very strong meme that neural networks are these uninterpretable black boxes. [...] That is just actually wrong. That is just legitimately completely wrong, and I know this for a fact. There is so much structure inside of neural networks. Sure, some of it is really complicated and not obviously easy to understand for a human, but there is so much structure there, and there are so many things we can learn from actually really studying these internal parts... again, staring at the object really hard actually works."

On being non-disclosure by default

"We are non-disclosure by default, and we take info hazards and general infosec and such very seriously. So the reasoning here is not that we won't ever publish anything. I expect that we will publish a lot of the work that we do, especially the interpretability work, I expect us to publish quite a lot of it, maybe mostly all of it, but the way we think about info hazards or general security and this kind of stuff, is that we think it's quite likely that there are relatively simple ideas out there that may come up during the doing of prosaic alignment research that cannot really increase capabilities, that we are messing around with a neural network to try to make it more aligned, or to make it more interpretable or something, and suddenly, it goes boom, and then suddenly it's five times more efficient or something. I think things like this can and will happen, and for this reason, it's very important for us to... I think of info hazard policy, kind of like wearing a seatbelt. It's probably where we'll release most of our stuff, but once you release something into the wild, it's out there. So by default, before we know whether something is safe or not, it's better just to keep our seat belt on and just keep it internal. So that's the kind of thinking here. It's a caution by default. I expect us to work on some stuff that probably shouldn't be published. I think a lot of prosaic alignment work is necessarily capabilities enhancing, making a model more aligned, a model that is better at doing what you wanted to do, almost always makes the model stronger."

"I want to have an organization where it costs you zero social capital to be concerned about keeping something secret. So for example, with the Chinchilla paper, what I've heard is, inside of DeepMind, there was quite a lot of pushback against keeping it secret. Apparently, the safety teams wanted to not publish it, and they got a lot of pushback from the capabilities people because they wanted to publish it. And that's just a dynamic I don't want to exist at Conjecture. I want to be the case that the safety researchers say "Hey, this is kind of scary. Maybe we shouldn't publish it" and that is completely fine. They don't have to worry about their jobs. They still get promotions, and it is normal and okay to be concerned about these things. That doesn't mean we don't publish things. If everyone's like, "Yep, this is good. This is a great alignment tool. We should share this with everybody," then we'll release, of course."

On Building Products as a For-Profit

"The choice to be for profit is very much utilitarian. So it's actually quite funny that on FTX future funds' FAQ, they actually say they suggest to many non-profits to actually try to be for profits if they can. Because this has a lot of good benefits such as being better for hiring, creating positive feedback loops and potentially making them much more long-term sustainable. So the main reason I'm interested [in being a for-profit] is long term sustainability and the positive feedback loops, and also the hiring is nice. So I think there's like a lot of positive things about for-profit companies. There's a lot of negative things, but like it's also a lot of positive things and a lot of negative things with non-profits too, that I think get slipped under the rug in EA. Like in EA it feels like the default is a non-profit and you have to justify going outside of the Overton window."

"The way I think about products at the moment is, I basically think that there are the current state-of-the-art models that have opened this exponentially large field of possible new products that has barely been tapped. GPT-3 opens so many potential useful products that just all will make profitable companies and someone has to pick them. I think without pushing the state of the art at all, we can already make a bunch of products that will be profitable. And most of them are probably going to be relatively boring [...] You want to do a SaaS product, something that helps you with some business task. Something that helps you make a process more efficient inside of a company or something like that. There' tons of these things, which are just like not super exciting, but they're like useful."

Scaling The Alignment Field

"Our advertising quote, unquote, is just like one LessWrong post that was like, "Oh, we're hiring". Right? And we got a ton of great application. Like the signal to noise was actually wild. Like one in three applications were just really good, which like never happens. So, like, incredible. So we got to hire some really phenomenal people for our first hiring round. And so at this point we're already basically at a really enviable position. I mean, it's like, it's annoying, but it's a good problem to have, where we're basically already funding constrained. We're at the point where I have people I want to hire projects for them to do and the management capacity to handle them. And I just don't have the funding at the moment to hire them."

"Conjecture is an organization that is directly tackling the alignment problem and we're a de-correlated bet from the other ones. I'm glad, I'm super glad that Redwood and Anthropic are doing the things they do, but they're kind of doing a very similar direction of alignment research. We're doing something very different and we're doing it at a different location. We have access to a whole new talent pool of European talent that cannot come to the US. We get a lot of new people into the field. We also have the EleutherAI people coming in, different research directions and de-correlated bets. And we can scale. We have a lot of operational capacity, a lot of experience and also entrepreneurial vigor."