Yeah, I wanted to hear your actual thoughts first, but I considered going into four possible objections:
Offense is favored over defense because, e.g., one AI can just nuke the other. The asymmetries come from physics, where you can't physically build shields that are more resilient than the strongest shield-destroying tech. Absent new physics, extra intelligence doesn't fundamentally change this dynamic, though it can buy you more time in which to strike first.
(E.g., being smarter may let you think faster, or may let you copy yourself to more locations so it takes more time for nukes or nanobots to hit every copy of you. But it doesn't let you build a wall where you can just hang out on Earth with another superintelligence and not worry about the other superintelligence breaking your wall.)
I got the impression Eliezer's claiming that a dangerous superintelligence is merely sufficient for nanotech.
No, I'm pretty confident Eliezer thinks AGI is both necessary and sufficient for nanotech. (Realistically/probabilistically speaking, given plausible levels of future investment into each tech. Obviously it's not logically necessary or sufficient.) Cf. my summary of Nate's view in Nate's reply to Joe Carlsmith:
Nate agrees that if there's a sphexish way to build world-saving nanosystems, then this should immediately be the top priority, and would be
Reply by acylhalide on the EA Forum:
The AGI would rather write programs to do the grunt work, than employ humans, as they can be more reliable, controllable, etc. It could create such agents by looking into its own source code and copying / modifying it. If it doesn't have this capability it will spend time researching (could be years) until it does. On a thousand-year timescale it isn't clear why an AGI would need us for anything besides say, specimens for experiments.Also as reallyeli says, having a single misaligned agent with absolute control of our future seems terrible no matter what the agent does.
The AGI would rather write programs to do the grunt work, than employ humans, as they can be more reliable, controllable, etc. It could create such agents by looking into its own source code and copying / modifying it. If it doesn't have this capability it will spend time researching (could be years) until it does. On a thousand-year timescale it isn't clear why an AGI would need us for anything besides say, specimens for experiments.
Also as reallyeli says, having a single misaligned agent with absolute control of our future seems terrible no matter what the agent does.
[W]iping out humanity is the most expensive of these options and the AGI would likely get itself destroyed while trying to do that[.]
It would be pretty easy and cheap for something much smarter than a human to kill all humans. The classic scenario is:
A. [...] The notion of a 'superintelligence' is not that it sits around in Goldman Sachs's basement trading stocks for its corporate masters. The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then
Reply by reallyeli on the EA Forum:
Toby Ord's definition of an existential catastrophe is "anything that destroys humanity's longterm potential." The worry is that misaligned AGI which vastly exceeds humanity's power would be basically in control of what happens with humans, just as humans are, currently, basically in control of what happens with chimpanzees. It doesn't need to kill all of us in order for this to be a very, very bad outcome.E.g. the enslavement by the steel-loving AGI you describe sounds like an existential catastrophe, if that AGI is suff
Toby Ord's definition of an existential catastrophe is "anything that destroys humanity's longterm potential." The worry is that misaligned AGI which vastly exceeds humanity's power would be basically in control of what happens with humans, just as humans are, currently, basically in control of what happens with chimpanzees. It doesn't need to kill all of us in order for this to be a very, very bad outcome.
E.g. the enslavement by the steel-loving AGI you describe sounds like an existential catastrophe, if that AGI is suff
My Eliezer-model thinks pivotal acts are genuinely, for-real, actually important. Like, he's not being metaphorical or making a pedagogical point when he says (paraphrasing) 'we need to use the first AGI systems to execute a huge, disruptive, game-board-flipping action, or we're all dead'.
When my Eliezer-model says that the most plausible pivotal acts he's aware of involve capabilities roughly at the level of 'develop nanotech' or 'put two cellular-identical strawberries on a plate', he's being completely literal. If some significantly weaker capability le... (read more)
For practical purposes, I'd say the pandemic is already over. MIRI isn't doing much hiring, though it's doing a little. The two big things we feel bottlenecked on are:
For 2, I think the best way to get hired by MIRI is to prove your abilities via the Visible Thoughts Pr... (read more)
Echoing that I loved these conversations and I'm super grateful to everyone who participated — especially Richard, Paul, Eliezer, Nate, Ajeya, Carl, Rohin, and Jaan, who contributed a lot.
I don't plan to try to summarize the discussions or distill key take-aways myself (other than the extremely cursory job I did on https://intelligence.org/late-2021-miri-conversations/), but I'm very keen on seeing others attempt that, especially as part of a process to figure out their own models and do some evaluative work.
I think I'd rather see partial summaries/respons... (read more)
Question from evelynciara on the EA Forum:
Do you believe that AGI poses a greater existential risk than other proposed x-risk hazards, such as engineered pandemics? Why or why not?
For sure. It's tricky to wipe out humanity entirely without optimizing for that in particular -- nuclear war, climate change, and extremely bad natural pandemics look to me like they're at most global catastrophes, rather than existential threats. It might in fact be easier to wipe out humanity by enginering a pandemic that's specifically optimized for this task (than it is to develop AGI), but we don't see vast resources flowing into humanity-killing-virus projects, the way that we see vast resources flowing into AGI projects. By my accounting, most other... (read more)
I mean I had an impression that pretty much everyone assigned >5% probability to "if we scale we all die" so it's already enough reason to work on global coordination on safety.
What specific actions do you have in mind when you say "global coordination on safety", and how much of the problem do you think these actions solve?
My own view is that 'caring about AI x-risk at all' is a pretty small (albeit indispensable) step. There are lots of decisions that hinge on things other than 'is AGI risky at all'.
I agree with Rohin that the useful thing is trying t... (read more)
But also my sense is that there's some deep benefit from "having mainlines" and conversations that are mostly 'sentences-on-mainline'?
I agree with this. Or, if you feel ~evenly split between two options, have two mainlines and focus a bunch on those (including picking at cruxes and revising your mainline view over time).
Like, it feels to me like Eliezer was generating sentences on his mainline, and Richard was responding with 'since you're being overly pessimistic, I will be overly optimistic to balance', with no attempt to have his response match his
If we have some way to limit an AI's strategy space, or limit how efficiently and intelligently it searches that space, then we can maybe recapitulate some of the stuff that makes humans safe (albeit at the cost that the debate answers will probably be way worse — but maybe we can still get nanotech or whatever out of this process).
If that's the plan, then I guess my next question is how we should go about limiting the strategy space and/or reducing the search quality? (Taking into account things like deception risk.)
Alternatively, maybe you think that som... (read more)
In that particular non-failure story, I'm definitely imagining that they aren't "trying to win the debate" (where "trying" is a very strong word that implies taking over the world to win the debate).
Suppose I'm debating someone about gun control, and they say 'guns don't kill people; people kill people'. Here are four different scenarios for how I might respond:
Like, fundamentally the question is something like "how efficient and accurate is the AI research market?"
I would distinguish two factors:
You could turn the "powerful and well-directed" dial up to the maximum allowed by physics, and still not thereby guarantee that information asymmetries are rare, because the way that a society applies maximum optimization pressure to reaching AGI ASAP might route through a lot of indiv... (read more)
or honestly panic about not having achieved it and halt, by which point a runner-up who doesn’t understand the importance of alignment/corrigibility/obedience deploys their system which destroys the world
Note that this is still better than 'honestly panic about not having achieved it and throw caution to the wind / rationalize reasons they don't need to halt'!
A more recent explanation of CEV by Eliezer: https://arbital.com/p/cev/
We have now received the first partial run that meets our quality bar. The run was submitted by LessWrong user Vanilla_cabs. Vanilla's team is still expanding the run (and will probably fix some typos, etc. later), but I'm providing a copy of it here with Vanilla's permission, to give others an example of the kind of thing we're looking for:
Vanilla's run is currently 266 steps long. Per the Visible Thoughts Project FAQ, we're willing to pay authors $20 / step for partial ru... (read more)
In case you missed it: we now have an FAQ for this project, last updated Jan. 7.
how do you get some substance into every human's body within the same 1 second period? Aren't a bunch of people e.g. in the middle of some national park, away from convenient air vents? Is the substance somehow everywhere in the atmosphere all at once?
I think the intended visualization is simply that you create a very small self-replicating machine, and have it replicate enough times in the atmosphere that every human-sized organism on the planet will on average contain many copies of it.
One of my co-workers at MIRI comments:
(further conjunctive detail for
Reply by Holden Karnofsky: https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/nNqXfnjiezYukiMJi
Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.
Sounds to me like one of the things Eliezer is pointing at in Hero Licensing:
Look, thinking things like that is just not how the inside of my head is organized. There’s just the book I have in my head and the question of whether I can translate that image into reality. My mental world is about the book, not about me.
You do want to train your brain, and you want to understand your strengths and weaknesses. But dwelling on your biases at the ex... (read more)
Is this 5 years of engineering effort and then humans leaving it alone with infinite compute?
Maybe something like '5 years of engineering effort to start automating work that qualitatively (but incredibly slowly and inefficiently) is helping with AI research, and then a few decades of throwing more compute at that for the AI to reach superintelligence'?
With infinite compute you could just recapitulate evolution, so I doubt Paul thinks there's a crux like that? But there could be a crux that's about whether GPT-3.5 plus a few decades of hardware progress achieves superintelligence, or about whether that's approximately the fastest way to get to superintelligence, or something.
When I try to mentally simulate negative reader-reactions to the dialogue, I usually get a complicated feeling that's some combination of:
I think part of what I was reacting to is a kind of half-formed argument that goes something like:
I had mixed feelings about the dialogue personally. I enjoy the writing style and think Eliezer is a great writer with a lot of good opinions and arguments, which made it enjoyable.
But at the same time, it felt like he was taking down a strawman. Maybe you’d label it part of “conflict aversion”, but I tend to get a negative reaction to take-downs of straw-people who agree with me.
To give an unfair and exaggerated comparison, it would be a bit like reading a take-down of a straw-rationalist in which the straw-rationalist occasionally insists such things as ... (read more)
Note: I've written up short summaries of each entry in this sequence so far on https://intelligence.org/late-2021-miri-conversations/, and included links to audio recordings of most of the posts.
I've gotten one private message expressing more or less the same thing about this post, so I don't think this is a super unusual reaction.
I don't know Eliezer's view on this — presumably he either disagrees that the example he gave is "mundane AI safety stuff", or he disagrees that "mundane AI safety stuff" is widespread? I'll note that you're a MIRI research associate, so I wouldn't have auto-assumed your stuff is representative of the stuff Eliezer is criticizing.
Safety Interruptible Agents is an example Eliezer's given in the past of work that isn't "real" (back in 2017):
[...]It seems to me that I've watched organizations like OpenPhil try to sponsor academics to work on AI alignment, and
It seems to me that I've watched organizations like OpenPhil try to sponsor academics to work on AI alignment, and
Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
My Eliezer-model doesn't categorically object to this. See, e.g., Fake Causality:
[Phlogiston] feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may fe
(This post was partly written as a follow-up to Eliezer's conversations with Paul and Ajeya, so I've inserted it into the conversations sequence.)
It does fit well there, but I think it was more inspired by the person I met who thought I was being way too arrogant by not updating in the direction of OpenPhil's timeline estimates to the extent I was uncertain.
My Eliezer-model is a lot less surprised by lulls than my Paul-model (because we're missing key insights for AGI, progress on insights is jumpy and hard to predict, the future is generally very unpredictable, etc.). I don't know exactly how large of a lull or winter would start to surprise Eliezer (or how much that surprise would change if the lull is occurring two years from now, vs. ten years from now, for example).
In Yudkowsky and Christiano Discuss "Takeoff Speeds", Eliezer says:
I have a rough intuitive feeling that it [AI progress] was going faster in
Found two Eliezer-posts from 2016 (on Facebook) that I feel helped me better grok his perspective.
Sep. 14, 2016:
It is amazing that our neural networks work at all; terrifying that we can dump in so much GPU power that our training methods work at all; and the fact that AlphaGo can even exist is still blowing my mind. It's like watching a trillion spiders with the intelligence of earthworms, working for 100,000 years, using tissue paper to construct nuclear weapons.
And earlier, Jan. 27, 2016:
People occasionally ask me about signs that the remaining timeline
Minor note: This post comes earlier in the sequence than Christiano, Cotra, and Yudkowsky on AI progress. I posted the Christiano/Cotra/Yudkowsky piece sooner, at Eliezer's request, to help inform the ongoing discussion of "Takeoff Speeds".
Transcript error fixed -- the line that previously read
I expect it to go away before the end of days
but with there having been a big architectural innovation, not Stack More Layers
if you name 5 possible architectural innovations I can call them small or large
but with there having b
(Ah, EY already replied.)
It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%).
My model of Eliezer may be wrong, but I'd guess that this isn't a domain where he has many over-50% predictions of novel events at all? See also 'I don't necessarily expect self-driving cars before the apocalypse'.
My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like 'maybe we'll make insane progress on theorem-proving (or whatever) out of the bl... (read more)
(... Admittedly, you read fast enough that my 'skimming' is your 'reading'. 😶)
Yeah, even I wasn't sure you'd read those three things, Eliezer, though I knew you'd at least glanced over 'Takeoff Speeds' and 'Biological Anchors' enough to form opinions when they came out. :)
I grimly predict that the effect of this dialogue on the community will be polarization
Beware of self-fulfilling prophecies (and other premature meta)! If both sides in a dispute expect the other side to just entrench, then they're less likely to invest the effort to try to bridge the gap.
This very comment section is one of the main things that will determine the community's reaction, and diverting our focus to 'what will our reaction be?' before we've talked about the object-level claims can prematurely lock in a certain reaction.
(That said, I think you'r... (read more)
Fair enough! I too dislike premature meta, and feel bad that I engaged in it. However... I do still feel like my comment probably did more to prevent polarization than cause it? That's my independent impression at any rate. (For the reasons you mention).
I certainly don't want to give up! In light of your pushback I'll edit to add something at the top.
You obviously can do whatever you want, but I find myself confused at this idea being discarded. Like, it sounds exactly like the antidote to so much confusion around these discussions and your position, such that if that was clarified, more people could contribute helpfully to the discussion, and either come to your side or point out non-trivial issues with your perspective. Which sounds really valuable for both you and the field!
I'ma guess that Eliezer thinks there's a long list of sequences he could write meeting these conditions, each on a different topic.
Possible outcomes are in the mind of a world-modeller - reality just is as it is (exactly one way) and isn't made of possibilities. So in what sense do the consequentialist-like things Yudkowsky is referring to funnel history?
I'm not sure that I understand the question, but my intuition is to say: they funnel world-states into particular outcomes in the same sense that literal funnels funnel water into particular spaces, or in the same sense that a slope makes things roll down it.
If you find water in a previously-empty space with a small aperture, and you'... (read more)
Thanks for the replies! I'm still somewhat confused but will try again to both ask the question more clearly and summarise my current understanding.What, in the case of consequentialists, is analogous to the water funnelled by literal funnels? Is it possibilities-according-to-us? Or is it possibilities-according-to-the-consequentialist? Or is it neither (or both) of those?
To clarify a little what the options in my original comment were, I'll say what I think they correspond to for literal funnels. Option 1 corresponds to the fact that funnels are usually n... (read more)
I love this post. Thanks, John.
This is the first post in a sequence, consisting of the logs of a Discord server MIRI made for hashing out AGI-related disagreements with Richard Ngo, Open Phil, etc.
I did most of the work of turning the chat logs into posts, with lots of formatting help from Matt Graves and additional help from Oliver Habryka, Ray Arnold, and others. I also hit the 'post' button for Richard and Eliezer. (I don't plan to repeat this note on future posts in this sequence, unless folks request it.)
I suspect a third important reason is that MIRI thinks alignment is mostly about achieving a certain kind of interpretability/understandability/etc. in the first AGI systems. Most ML experiments either aren't about interpretability and 'cracking open the hood', or they're not approaching the problem in a way that MIRI's excited by.
E.g., if you think alignment research is mostly about testing outer reward function to see what first-order behavior they produce in non-AGI systems, rather than about 'looking in the learned model's brain' to spot mesa-optimizat... (read more)
This is a good comment! I also agree that it's mostly on MIRI to try to explain its views, not on others to do painstaking exegesis. If I don't have a ready-on-hand link that clearly articulates the thing I'm trying to say, then it's not surprising if others don't have it in their model.
And based on these comments, I update that there's probably more disagreement-about-MIRI than I was thinking, and less (though still a decent amount of) hyperbole/etc. If so, sorry about jumping to conclusions, Adam!
(I suspect there are a bunch of other disagreements going into this too, including basic divergences on questions like 'What's even the point of aligning AGI? What should humanity do with aligned AGI once it has it?'.)
I would agree with you that "MIRI hates all experimental work" / etc. is not a faithful representation of this state of affairs, but I think there is nevertheless an important disagreement MIRI has with typical ML people, and that the disagreement is primarily about what we can learn from experiments.
Ooh, that's really interesting. Thinking about it, I think my sense of what's going on is (and I'd be interested to hear how this differs from your sense):
Thanks. For time/brevity, I'll just say which things I agree / disagree with:> sufficiently capable and general AI is likely to have property X as a strong default [...]
I generally agree with this, although for certain important values of X (such as "fooling humans for instrumental reasons") I'm probably more optimistic than you that there will be a robust effort to get not-X, including by many traditional ML people. I'm also probably more optimistic (but not certain) that those efforts will succeed.
[inside view, modest epistemology]: I don't have... (read more)
So, the point of my comments was to draw a contrast between having a low opinion of "experimental work and not doing only decision theory and logic", and having a low opinion of "mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc." I didn't intend to say that the latter is obviously-wrong; my goal was just to point out how different those two claims are, and say that the difference actually matters, and that this kind of hyperbole (especially when it never gets acknowledged later as 'oh yeah, th... (read more)
So, the point of my comments was to draw a contrast between having a low opinion of "experimental work and not doing only decision theory and logic", and having a low opinion of "mainstream ML alignment work, and of nearly all work outside the HRAD-ish cluster of decision theory, logic, etc." I didn't intend to say that the latter is obviously-wrong; my goal was just to point out how different those two claims are, and say that the difference actually matters, and that this kind of hyperbole (especially when it never gets acknowledged later as 'oh yeah, th
Not sure if this helps, and haven't read the thread carefully, but my sense is your framing might be eliding distinctions that are actually there, in a way that makes it harder to get to the bottom of your disagreement with Adam. Some predictions I'd have are that:
* For almost any experimental result, a typical MIRI person (and you, and Eliezer) would think it was less informative about AI alignment than I would. * For almost all experimental results you would think they were so much less informative as to not be worthwhile. * There's a sma... (read more)
'Experimental work is categorically bad, but Redwood's work doesn't count' does not sound like a "slight caveat" to me! What does this generalization mean at all if Redwood's stuff doesn't count?
(Neither, for that matter, does the difference between 'decision theory and logic' and 'all mathy stuff MIRI has ever focused on' seem like a 'slight caveat' to me -- but in that case maybe it's because I have a lot more non-logic, non-decision-theory examples in my mind that you might not be familiar with, since it sounds like you haven't read much MIRI stuff?).
(Responding to entire comment thread) Rob, I don't think you're modeling what MIRI looks like from the outside very well.
! Yay! That's really great to hear. :)
Thanks for adding the note! :)
I'm confused. When I say 'that's just my impression', I mean something like 'that's an inside-view belief that I endorse but haven't carefully vetted'. (See, e.g., Impression Track Records, referring to Naming Beliefs.)
Example: you said that MIRI has "contempt with experimental work and not doing only decision theory and logic".
My prior guess would have been that you don't actually, for-real believe that -- that it's not your 'impression' in the above sense, more like 'unendorsed venting/hyperbole that has a more complicated r... (read more)