CEO at Conjecture.
I don't know how to save the world, but dammit I'm gonna try.
Hard for me to make sense of this. What philosophical questions do you think you'll get clarity on by doing this? What are some examples of people successfully doing this in the past?
The fact you ask this question is interesting to me, because in my view the opposite question is the more natural one to ask: What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it! From my point of view, this is the default of successful human epistemology, and the exception should be viewed with suspicion.
And for what it's worth, acting in the real world, building a company, raising money, debating people live, building technology, making friends (and enemies), absolutely helped me become far, far less confused, and far more capable of tackling confusing problems! Actually testing my epistemology and rationality against reality, and failing (a lot), has been far more helpful for deconfusing everything from practical decision making skills to my own values than reading/thinking could have ever been in the same time span. There is value in reading and thinking, of course, but I was in a severe "thinking overhang", and I needed to act in the world to keep learning and improving. I think most people (especially on LW) are in an "action underhang."
"Why do people do things?" is an empirical question, it's a thing that exists in external reality, and you need to interact with it to learn more about it. And if you want to tackle even higher level problems, you need to have even more refined feedback. When a physicist wants to understand the fundamentals of reality, they need to set up insane crazy particle accelerators and space telescopes and supercomputers and what not to squeeze bits of evidence out of reality and actually ground whatever theoretical musings they may have been thinking about. So if you want to understand the fundamentals of philosophy and the human condition, by default I expect you are going to need to do the equivalent kind of "squeezing bits out of reality", by doing hard things such as creating institutions, building novel technology, persuading people, etc. "Building a company" is just one common example of a task that forces you to interact a lot with reality to be good.
Fundamentally, I believe that good philosophy should make you stronger and allow you to make the world better, otherwise, why are you bothering? If you actually "solve metaphilosophy", I think the way this should end up looking is that you can now do crazy things. You can figure out new forms of science crazy fast, you can persuade billionaires to support you, you can build monumental organizations that last for generations. Or, in reverse, I expect that if you develop methods to do such impressive feats, you will necessarily have to learn deep truths about reality and the human condition, and acquire the skills you will need to tackle a task as heroic as "solving metaphilosophy."
Everyone dying isn't the worst thing that could happen. I think from a selfish perspective, I'm personally a bit more scared of surviving into a dystopia powered by ASI that is aligned in some narrow technical sense. Less sure from an altruistic/impartial perspective, but it seems at least plausible that building an aligned AI without making sure that the future human-AI civilization is "safe" is a not good thing to do.
I think this grounds out into object level disagreements about how we expect the future to go, probably. I think s-risks are extremely unlikely at the moment, and when I look at how best to avoid them, most such timelines don't go through "figure out something like metaphilosophy", but more likely through "just apply bog standard decent humanist deontological values and it's good enough." A lot of the s-risk in my view comes from the penchant for maximizing "good" that utilitarianism tends to promote, if we instead aim for "good enough" (which is what most people tend to instinctively favor), that cuts off most of the s-risk (though not all).
To get to the really good timelines, that route through "solve metaphilosophy", there are mandatory previous nodes such as "don't go extinct in 5 years." Buying ourselves more time is powerful optionality, not just for concrete technical work, but also for improving philosophy, human epistemology/rationality, etc.
I don't think I see a short path to communicating the parts of my model that would be most persuasive to you here (if you're up for a call or irl discussion sometime lmk), but in short I think of policy, coordination, civilizational epistemology, institution building and metaphilosophy as closely linked and tractable problems, if only it wasn't the case that there was a small handful of AI labs (largely supported/initiated by EA/LW-types) that are deadset on burning the commons as fast as humanly possible. If we had a few more years/decades, I think we could actually make tangible and compounding progress on these problems.
I would say that better philosophy/arguments around questions like this is a bottleneck. One reason for my interest in metaphilosophy that I didn't mention in the OP is that studying it seems least likely to cause harm or make things worse, compared to any other AI related topics I can work on. (I started thinking this as early as 2012.) Given how much harm people have done in the name of good, maybe we should all take "first do no harm" much more seriously?
I actually respect this reasoning. I disagree strategically, but I think this is a very morally defensible position to hold, unlike the mental acrobatics necessary to work at the x-risk factories because you want to be "in the room".
Which also represents an opportunity...
It does! If I was you, and I wanted to push forward work like this, the first thing I would do is build a company/institution! It will both test your mettle against reality and allow you to build a compounding force.
Is it actually that weird? Do you have any stories of trying to talk about it with someone and having that backfire on you?
Yup, absolutely. If you take even a microstep outside of the EA/rat-sphere, these kind of topics quickly become utterly alien to anyone. Try explaining to a politician worried about job loss, or a middle aged housewife worried about her future pension, or a young high school dropout unable to afford housing, that actually we should be worried about whether we are doing metaphilosophy correctly to ensure that future immortal superintelligence reason correctly about acausal alien gods from math-space so they don't cause them to torture trillions of simulated souls! This is exaggerated for comedic effect, but this is really what even relatively intro level LW philosophy by default often sounds like to many people!
As the saying goes, "Grub first, then ethics." (though I would go further and say that people's instinctive rejection of what I would less charitably call "galaxy brain thinking" is actually often well calibrated)
As someone that does think about a lot of the things you care about at least some of the time (and does care pretty deeply), I can speak for myself why I don't talk about these things too much:
All that being said, I still am glad some people like you exist, and if I could make your work go faster, I would love to do so. I wish I could live in a world where I could justify working with you on these problems full time, but I don't think I can convince myself this is actually the most impactful thing I could be doing at this moment.
I initially liked this post a lot, then saw a lot of pushback in the comments, mostly of the (very valid!) form of "we actually build reliable things out of unreliable things, particularly with computers, all the time". I think this is a fair criticism of the post (and choice of examples/metaphors therein), but I think it may be missing (one of) the core message(s) trying to be delivered.
I wanna give an interpretation/steelman of what I think John is trying to convey here (which I don't know whether he would endorse or not):
"There are important assumptions that need to be made for the usual kind of systems security design to work (e.g. uncorrelation of failures). Some of these assumptions will (likely) not apply with AGI. Therefor, extrapolating this kind of thinking to this domain is Bad™️." ("Epistemological vigilance is critical")
So maybe rather than saying "trying to build robust things out of brittle things is a bad idea", it's more like "we can build robust things out of certain brittle things, e.g. computers, but Godzilla is not a computer, and so you should only extrapolate from computers to Godzilla if you're really, really sure you know what you're doing."
Yes, we do expect this to be the case. Unfortunately, I think explaining in detail why we think this may be infohazardous. Or at least, I am sufficiently unsure about how infohazardous it is that I would first like to think about it for longer and run it through our internal infohazard review before sharing more. Sorry!
Redwood is doing great research, and we are fairly aligned with their approach. In particular, we agree that hands-on experience building alignment approaches could have high impact, even if AGI ends up having an architecture unlike modern neural networks (which we don’t believe will be the case). While Conjecture and Redwood both have a strong focus on prosaic alignment with modern ML models, our research agenda has higher variance, in that we additionally focus on conceptual and meta-level research. We’re also training our own (large) models, but (we believe) Redwood are just using pretrained, publicly available models. We do this for three reasons:
We’re also for-profit, while Redwood is a nonprofit, and we’re located in London! Not everyone lives out in the Bay :)
For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.
But let’s assume this.
Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was something I could do that would irreversibly bind me to my commitment to the best interests of mankind in a publicly verifiable way, I would do it in a heartbeat. But there isn’t and most attempts at such are security theater.
What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.
On a meta-level, I think the best guarantee I can give is simply that not acting in humanity’s best interest is, in my model, Stupid. And my personal guiding philosophy in life is “Don’t Be Stupid”. Human values are complex and fragile, and while many humans disagree about many details of how they think the world should be, there are many core values that we all share, and not fighting with everything we’ve got to protect these values (or dying with dignity in the process) is Stupid.
Ideally, we would like Conjecture to scale quickly. Alignment wise, in 5 years time, we want to have the ability to take a billion dollars and turn it into many efficient, capable, aligned teams of 3-10 people working on parallel alignment research bets, and be able to do this reliably and repeatedly. We expect to be far more constrained by talent than anything else on that front, and are working hard on developing and scaling pipelines to hopefully alleviate such bottlenecks.
For the second question, we don't expect it to be a competing force (as in, we have people who could be working on alignment working on product instead). See point two in this comment.
This is why we will focus on SaaS products on top of our internal APIs that can be built by teams that are largely independent from the ML engineering. As such, this will not compete much with our alignment-relevant ML work. This is basically our thesis as a startup: We expect it to be EV+, as this earns much more money than we would have had otherwise.
To point 1: While we greatly appreciate what OpenPhil, LTFF and others do (and hope to work with them in the future!), we found that the hurdles required and strings attached were far greater than the laissez-faire silicon valley VC we encountered, and seemed less scalable in the long run. Also, FTX FF did not exist back when we were starting out.
While EA funds as they currently exist are great at handing out small to medium sized grants, the ~8 digit investment we were looking for to get started asap was not something that these kinds of orgs were generally interested in giving out (which seems to be changing lately!), especially to slightly unusual research directions and unproven teams. If our timelines were longer and the VC money had more strings attached (as some of us had expected before seeing it for ourselves!), we may well have gone another route. But the truth of the current state of the market is that if you want to scale to a billion dollars as fast as possible with the most founder control, this is the path we think is most likely to succeed.
To point 2: This is why we will focus on SaaS products on top of our internal APIs that can be built by teams that are largely independent from the ML engineering. As such, this will not compete much with our alignment-relevant ML work. This is basically our thesis as a startup: We expect it to be EV+, as this earns much more money than we would have had otherwise.
Notice this is a contingent truth, not an absolute one. If tomorrow, OpenPhil and FTX contracted us with 200M/year to do alignment work, this would of course change our strategy.
To point 3: We don’t think this has to be true. (Un)fortunately, given the current pace of capability progress, we expect keeping up with the pace to be more than enough for building new products. Competition on AI capabilities is extremely steep and not in our interest. Instead, we believe that (even) the (current) capabilities are so crazy that there is an unlimited potential for products, and we plan to compete instead on building a reliable pipeline to build and test new product ideas.
Calling it competition is actually a misnomer from our point of view. We believe there is ample space for many more companies to follow this strategy, still not have to compete, and turn a massive profit. This is how crazy capabilities and their progress are.
To address the opening quote - the copy on our website is overzealous, and we will be changing it shortly. We are an AGI company in the sense that we take AGI seriously, but it is not our goal to accelerate progress towards it. Thanks for highlighting that.
We don’t have a concrete proposal for how to reliably signal that we’re committed to avoiding AGI race dynamics beyond the obvious right now. There is unfortunately no obvious or easy mechanism that we are aware of to accomplish this, but we are certainly open to discussion with any interested parties about how best to do so. Conversations like this are one approach, and we also hope that our alignment research speaks for itself in terms of our commitment to AI safety.
If anyone has any more trust-inducing methods than us simply making a public statement and reliably acting consistently with our stated values (where observable), we’d love to hear about them!
To respond to the last question - Conjecture has been “in the making” for close to a year now and has not been a secret, we have discussed it in various iterations with many alignment researchers, EAs and funding orgs. A lot of initial reactions were quite positive, in particular towards our mechanistic interpretability work, and just general excitement for more people working on alignment. There have of course been concerns around organizational alignment, for-profit status, our research directions and the founders’ history with EleutherAI, which we all have tried our best to address.
But ultimately, we think whether or not the community approves of a project is a useful signal for whether a project is a good idea, but not the whole story. We have our own idiosyncratic inside-views that make us think that our research directions are undervalued, so of course, from our perspective, other people will be less excited than they should be for what we intend to work on. We think more approaches and bets are necessary, so if we would only work on the most consensus-choice projects we wouldn’t be doing anything new or undervalued. That being said, we don’t think any of the directions or approaches we’re tackling have been considered particularly bad or dangerous by large or prominent parts of the community, which is a signal we would take seriously.
We (the founders) have a distinct enough research agenda to most existing groups such that simply joining them would mean incurring some compromises on that front. Also, joining existing research orgs is tough! Especially if we want to continue along our own lines of research, and have significant influence on their direction. We can’t just walk in and say “here are our new frames for GPT, can we have a team to work on this asap?”.
You’re right that SOTA models are hard to develop, but that being said, developing our own models is independently useful in many ways - it enables us to maintain controlled conditions for experiments, and study things like scaling properties of alignment techniques, or how models change throughout training, as well as being useful for any future products. We have a lot of experience in LLM development and training from EleutherAI, and expect it not to take up an inordinate amount of developer hours.
We are all in favor of high bandwidth communication between orgs. We would love to work in any way we can to set these channels up with the other organizations, and are already working on reaching out to many people and orgs in the field (meet us at EAG if you can!).
In general, all the safety orgs that we have spoken with are interested in this, and that’s why we expect/hope this kind of initiative to be possible soon.