Fwiw, I interpreted this as saying that it doesn't work as a safety proposal (see also: my earlier comment). Also seems related to his arguments about ML systems having squiggles.
Yup. You can definitely train powerful systems on imitation of human thoughts, and in the limit this just gets you a powerful mesa-optimizer that figures out how to imitate them.
The question is when you get a misaligned mesaoptimizer relative to when you get superhuman behavior.
I think it's pretty clear that you can get an optimizer which is upstream of the imitation (i.e. whose optimization gives rise to the imitation), or you can get an optimizer which is downstream of the imitation (i.e. which optimizes in virtue of its imitation). Of course most outcomes are messier than those two extremes, but the qualitative distinction still seems really central to these arguments.
I don't think you've made much argument about when the trans... (read more)
Agreed explicitly for the record.
When "List of Lethalities" was posted, I privately wrote a list of where I disagreed with Eliezer
Why privately?! Is there a phenomenon where other people feel concerned about the social reception of expressing disagreement until Paul does? This is a phenomenon common in many other fields - and I'd invoke it to explain how the 'tone' of talk about AI safety shifted so quickly once I came right out and was first to say everybody's dead - and if it's also happening on the other side then people need to start talking there too. Especially if people think they have solutions. They should talk.
OK, sure. First, I updated down on alignment difficulty after reading your lethalities post, because I had already baked in the expected-EY-quality doompost into my expectations. I was seriously relieved that you hadn't found any qualitatively new obstacles which might present deep challenges to my new view on alignment.
Here's one stab[1] at my disagreement with your list: Human beings exist, and our high-level reasoning about alignment has to account for the high-level alignment properties[2] of the only general intelligences we have ever ... (read more)
For example, ARC’s report on ELK describes at least 10 difficulties of the same type and severity as the ~20 technical difficulties raised in Eliezer’s list.
I skimmed through the report and didn't find anything that looked like a centralized bullet point list of difficulties. I think it's valuable in general if people say what the problems are that they're trying to solve, and then collect them into a place so people can look them over simultaneously. I realize I haven't done enough of this myself, but if you've already written up the comp... (read more)
I'm not sure if you are saying that you skimmed the report right now and couldn't find the list, or that you think that it was a mistake for the report not to contain a "centralized bullet point list of difficulties."
If you are currently looking for the list of difficulties: see the long footnote.
If you think the ELK report should have contained such a list: I definitely don't think we wrote this report optimally, but we tried our best and I'm not convinced this would be an improvement. The report is about one central problem that we attempt to state... (read more)
Best list so far, imo; it's what to beat.
Well, I had to think about this for longer than five seconds, so that's already a huge victory.
If I try to compress your idea down to a few sentences:
The humans ask the AI to produce design tools, rather than designs, such that there's a bunch of human cognition that goes into picking out the particular atomic arrangements or synthesis pathways; and we can piecewise verify that the tool is making accurate predictions; and the tool is powerful enough that we can build molecular nanotech and an uploader by using the tool for an amount of time too short for F... (read more)
Depends what the evil clones are trying to do.
Get me to adopt a solution wrong in a particular direction, like a design that hands the universe over to them? I can maybe figure out the first time through who's out to get me, if it's 200 Yudkowsky-years. If it's 200,000 Yudkowsky-years I think I'm just screwed.
Get me to make any lethal mistake at all? I don't think I can get to 90% confidence period, or at least, not without spending an amount of Yudkowsky-time equivalent to the untrustworthy source.
If I know that it was written by aligned people? I wouldn't just be trying to evaluate it myself; I'd try to get a team together to implement it, and understanding it well enough to implement it would be the same process as verifying whatever remaining verifiable uncertainty was left about the origins, where most of that uncertainty is unverifiable because the putative hostile origin is plausibly also smart enough to sneak things past you.
Maybe one way to pin down a disagreement here: imagine the minimum-intelligence AGI that could write this textbook (including describing the experiments required to verify all the claims it made) in a year if it tried. How many Yudkowsky-years does it take to safely evaluate whether following a textbook which that AGI spent a year writing will kill you?
Infinite? That can't be done?
Consider my vote to be placed that you should turn this into a post, keep going for literally as long as you can, expand things to paragraphs, and branch out beyond things you can easily find links for.
(I do think there's a noticeable extent to which I was trying to list difficulties more central than those, but I also think many people could benefit from reading a list of 100 noncentral difficulties.)
Nearly empty string of uncommon social inputs. All sorts of empirical inputs, including empirical inputs in the social form of other people observing things.
It's also fair to say that, though they didn't argue me out of anything, Moravec and Drexler and Ed Regis and Vernor Vinge and Max More could all be counted as social inputs telling me that this was an important thing to look at.
Well, my disorganized list sure wasn't complete, so why not go ahead and list some of the foreseeable difficulties I left out? Bonus points if any of them weren't invented by me, though I realize that most people may not realize how much of this entire field is myself wearing various trenchcoats.
Sure—that's easy enough. Just off the top of my head, here's five safety concerns that I think are important that I don't think you included:
The fact that there exist functions that are easier to verify than satisfy ensures that adversarial training can never guarantee the absence of deception.
It is impossible to verify a model's safety—even given arbitrarily good transparency tools—without access to that model's training process. For example, you could get a deceptive model that gradient hacks itself in such a way that cryptographically obfuscates i
Well, there's obviously a lot of points missing! And from the amount this post was upvoted, it's clear that people saw the half-assed current form as valuable.
Why don't you start listing out all the missing further points, then? (Bonus points for any that don't trace back to my own invention, though I realize a lot of people may not realize how much of this stuff traces back to my own invention.)
Humans point to some complicated things, but not via a process that suggests an analogous way to use natural selection or gradient descent to make a mesa-optimizer point to particular externally specifiable complicated things.
Several of the points here are premised on needing to do a pivotal act that is way out of distribution from anything the agent has been trained on. But it's much safer to deploy AI iteratively; increasing the stakes, time horizons, and autonomy a little bit each time.
To do what, exactly, in this nice iterated fashion, before Facebook AI Research destroys the world six months later? What is the weak pivotal act that you can perform so safely?
... (read more)Human raters make systematic errors - regular, compactly describable, predictable errors.... This is indeed one
To do what, exactly, in this nice iterated fashion, before Facebook AI Research destroys the world six months later? What is the weak pivotal act that you can perform so safely?
Do alignment & safety research, set up regulatory bodies and monitoring systems.
When the rater is flawed, cranking up the power to NP levels blows up the P part of the system.
Not sure exactly what this means. I'm claiming that you can make raters less flawed, for example, by decomposing the rating task, and providing model-generated critiques that help with their rating. Also, as models get more sample efficient, you can rely more on highly skilled and vetted raters.
Arbital was meant to support galaxy-brained attempts like this; Arbital failed.
This seems to me like a case of the imaginary hypothetical "weak pivotal act" that nobody can ever produce. If you have a pivotal act you can do via following some procedure that only the AI was smart enough to generate, yet humans are smart enough to verify and smart enough to not be reliably fooled about, NAME THAT ACTUAL WEAK PIVOTAL ACT.
Okay, I will try to name a strong-but-checkable pivotal act.
(Having a strong-but-checkable pivotal act doesn't necessarily translate into having a weak pivotal act. Checkability allows us to tell the difference between a good plan and a trapped plan with high probability, but the AI has no reason to give us a good plan. It will just produce output like "I have insufficient computing power to solve this problem" regardless of whether that's actually true. If we're unusually successful at convincing the AI our checking process is bad when it's actually good,... (read more)
I agree.
I tried something like this much earlier with a single question, "Can you explain why it'd be hard to make an AGI that believed 222 + 222 = 555", and got enough pushback from people who didn't like the framing that I shelved the effort.
This document doesn't look to me like something a lot of people would try to write. Maybe it was one of the most important things to write, but not obviously so. Among the steps (1) get the idea to write out all reasons for pessimism, (2) resolve to try, (3) not give up halfway through, and (4) be capable, I would not guess that 4 is the strongest filter.
Just to state the reigning orthodoxy among the Wise, if not among the general population: the interface between "AI developers" and "one AI" appears to be hugely more difficult, hugely more lethal, and vastly qualitatively different, from every other interface. There's a horrible opsec problem with respect to single defectors in the AI lab selling your code to China which then destroys the world, but this horrible opsec problem has nothing in common with the skills and art needed for the purely technical challenge of building an AGI that doesn't dest... (read more)
The concept of "interfaces of misalignment" does not mainly point to GovAI-style research here (although it also may serve as a framing for GovAI). The concrete domains separated by the interfaces in the figure above are possibly a bit misleading in that sense:
For me, the "interfaces of misalignment" are generating intuitions about what it means to align a complex system that may not even be self-aligned - rather just one aligning part of it. It is expanding not just the space of solutions, but also the space of meanings of "success". (For example, one ext... (read more)
My guess is an attempt to explain where I think we actually differ in "generative intuitions" will be more useful than a direct response to your conclusions, so here it is. How to read it: roughly, this is attempting to just jump past several steps of double-crux to the area where I suspect actual cruxes lie.
Continuity
In my view, your ontology of thinking about the problem is fundamentally discrete. For example, you are imaging a sharp boundary between a class of systems "weak, won't kill you, but also won't help you with alignment" and "st... (read more)
And if humans had a utility function and we knew what that utility function was, we would not need CEV. Unfortunately extracting human preferences over out-of-distribution options and outcomes at dangerously high intelligence, using data gathered at safe levels of intelligence and a correspondingly narrower range of outcomes and options, when there exists no sensory ground truth about what humans want because human raters can be fooled or disassembled, seems pretty complicated. There is ultimately a rescuable truth about what we want, and CEV i... (read more)
IMO, commitment races only occur between agents who will, in some sense, act like idiots, if presented with an apparently 'committed' agent. If somebody demands $6 from me in the Ultimatum game, threatening to leave us both with $0 unless I offer at least $6 to them... then I offer $6 with slightly less than 5/6 probability, so they do no better than if they demanded $5, the amount I think is fair. They cannot evade that by trying to make some 'commitment' earlier than I do. I expect that, whatever is the correct and sane version of this ... (read more)
Idea A (for “Alright”): Humanity should develop hardware-destroying capabilities — e.g., broadly and rapidly deployable non-nuclear EMPs — to be used in emergencies to shut down potentially-out-of-control AGI situations, such as an AGI that has leaked onto the internet, or an irresponsible nation developing AGI unsafely.
Sounds obviously impossible in real life, so how about you go do that and then I'll doff my hat in amazement and change how I speak of pivotal acts. Go get gain-of-function banned, even, that should be vastly simpler. Then we can talk ... (read more)
Yes, it was an intentional part of the goal.
If there were any possibility of surviving the first AGI built, then it would be nice to have AGI projects promising to do something that wouldn't look like trying to seize control of the Future for themselves, when, much later (subjectively?), they became able to do something like CEV. I don't see much evidence that they're able to think on the level of abstraction that CEV was stated on, though, nor that they're able to understand the 'seizing control of the Future' failure mode that CEV is meant to preve... (read more)
(I endorse dxu's entire reply.)
I would "destroy the world" from the perspective of natural selection in the sense that I would transform it in many ways, none of which were making lots of copies of my DNA, or the information in it, or even having tons of kids half resembling my old biological self.
From the perspective of my highly similar fellow humans with whom I evolved in context, they'd get nice stuff, because "my fellow humans get nice stuff" happens to be the weird unpredictable desire that I ended up with at the equilibrium of reflection on the weird unpredictable godshatter that... (read more)
Want to +1 that a vaguer version of this was my own rough sense of RNNs vs. CNNs vs. Transformers.
As much as Moravec-1988 and Moravec-1998 sound like they should be basically the same people, a decade passed between them, and I'd like to note that Moravec may legit have been making an updated version of his wrong argument in 1998 compared to 1988 after he had a chance to watch 10 more years pass and make his earlier prediction look less likely.
It does fit well there, but I think it was more inspired by the person I met who thought I was being way too arrogant by not updating in the direction of OpenPhil's timeline estimates to the extent I was uncertain.
Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend? Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general? Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?
Added: This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics wou... (read more)
I think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M -> $1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict
There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orde... (read more)
I also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?
Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025. The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.
So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse? Pretty bare port... (read more)
I think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails.
I'd say <4% on end of 2025.
I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are... (read more)
I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%. Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict. Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.
I frame no prediction about whether Paul is under 16%. That's a separate matter. I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.
Ha! Okay then. My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probabilit... (read more)
I don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting.
Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and m... (read more)
Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day. The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" a... (read more)
I'm mostly not looking for virtue points, I'm looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.
I don't think it's surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren't happy just predicting numbers for overall value added from machine translation, I'd kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.
If they've found some way to put a lot more compute into GPT-4 without making the model bigger, that's a very different - and unnerving - development.
(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)
I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read. I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO chal... (read more)
Yes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I'd look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I'm also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don't expect i... (read more)
I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024
Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.
I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.
I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).
But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life." We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.
I think there's a very real sense in which, yes... (read more)
But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."
Don't you think you're making a falsifiable prediction here?
Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.
I don't necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they're not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level. They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they're not doing that, an obvious guess is that it's because they're not getting a big win from that. As for their ability to then make algor... (read more)
While GPT-4 wouldn't be a lot bigger than GPT-3, Sam Altman did indicate that it'd use a lot more compute. That's consistent with Stack More Layers still working; they might just have found an even better use for compute.
(The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)
My memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was.
Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%. Here's an example of one such, which I have a potentially false memory of having maybe read at the time: https://www.gjopen.com/comments/118530
I feel like the biggest subjective thing is that I don't feel like there is a "core of generality" that GPT-3 is missing
I just expect it to gracefully glide up to a human-level foom-ing intelligence
This is a place where I suspect we have a large difference of underlying models. What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers? Particularly if you have an answer to anything that sounds like it's in the style of Gwern's questions, because I think those a... (read more)
If you give me 1 or 10 examples of surface capabilities I'm happy to opine. If you want me to name industries or benchmarks, I'm happy to opine on rates of progress. I don't like the game where you say "Hey, say some stuff. I'm not going to predict anything and I probably won't engage quantitatively with it since I don't think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3."
I don't even know which of Gwern's questions you think are interesting/meaningful. "Good meta-learning"--I don't... (read more)
The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world).
Would you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids?
Thanks for continuing to try on this! Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get $1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial $10B/yr. &nb... (read more)
I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.
I wouldn't be especially surprised by waifutechnology or machine translation jumping to newly accessible domains (the thing I care about and you shrug about (until the world ends)), but is that likely to exhibit a visible economic discontinuity in profits (which you care about and I shrug about (until the world ends))? There'... (read more)
I'd be happy to disagree about romantic chatbots or machine translation. I'd have to look into it more to get a detailed sense in either, but I can guess. I'm not sure what "wouldn't be especially surprised" means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you'd also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we're missing and we can revise in light of that, but we'd sti... (read more)
And to say it also explicitly, I think this is part of why I have trouble betting with Paul. I have a lot of ? marks on the questions that the Gwern voice is asking above, regarding them as potentially important breaks from trend that just get dumped into my generalized inbox one day. If a gradualist thinks that there ought to be a smooth graph of perplexity with respect to computing power spent, in the future, that's something I don't care very much about except insofar as it relates in any known way whatsoever to questions like those the Gwer... (read more)
This seems totally bogus to me.
It feels to me like you mostly don't have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I'm saying I'm totally happy to use gradualist metrics to predict any of those things. If you want to say "what does it mean to be a gradualist" I can just give you predictions on them.
To you this seems reasonable, because e.g. $ and benchmarks are not the right way to measure the kinds of impacts we care abou... (read more)
What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?
Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships.
I would generally be surprised if there were many highly non-li... (read more)
I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.
Well, it sounds like I might be more bullish than you on theorem-proving, possibly. Not on it being useful or profitable, but in terms of underlying technology making progr... (read more)
I'm going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper. Big unknowns are then (i) how many orders of magnitude of "low-hanging fruit" are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is "they are pretty easy" / "humans are pretty bad at these tasks," but it's somewhat harder to quantify. If you think your methodology is different from that then we will probably end u... (read more)
Well put / endorsed / +1.
Define "way more secure". Like, superhuman-at-security AGIs rewrote the systems to be formally unhackable even taking into account hardware vulnerabilities like Rowhammer that violate the logical chip invariants?
Can you talk a bit about the world global... (read more)
An attempted paraphrase, to hopefully-disentangle some claims:
Eliezer, list of AGI lethalities: pivotal acts are (necessarily?) "outside of the Overton window, or something"[1].
Critch, preceding post: Strategies involving non-Overton elements are not worth it
Critch, this post: there are pivotal outcomes you can via a strategy with no non-Overton elements
Eliezer, this comment: the "AI immune system" example is not an example of a strategy with no non-Overton elements
Possible reading: Critch/the reader/Eliezer currently wouldn't be able to name a strategy to... (read more)