(Oh I hadn't read the full thread, now I have; still no big update? Like, I continue to see him being seemingly overconfident in his ability to get those solutions, but I'm not seeing "oh he would have mistakenly come to think he had a solution when he didn't", if that's what you're trying to say.)
(I did read that one; it's interesting but basically in line with how I think he's overconfident; it's possible one or both of us is incorrectly reading in / not reading in to what he wrote there, about his absolute level of confidence in solving the philosophical problems involved.)
Ok. I think I might bow out for now unless there's something especially salient that I should look at, but by way of a bit of summary: I think we agree that Yudkowsky was somewhat overconfident about solving FAI, and that there's a super high bar that should be met before making an AGI, and no one knows how to meet that bar; my guess would be that we disagree about
That's kind of surprising (that this is your response), given that you signed the Superintelligence Statement which seems to contradict this.
I think no one should build AGI. If someone is going to build AGI anyway, then it might be correct to make AGI yourself first, if you have a way to make actually aligned (hopefully task-ish or something).
are you now convinced that sovereign-FAI (I'm avoiding "takeover" due to objection from Habryka and this) was a real and serious plan, or do you want more evidence?
Let me rephrase. I already believed that there had been a plan originally, like 2004 +/- 3 years, to make sovereign AI. When I was entering the field, I don't recall thinking too hard about "what do you do with it", other than thinking about a Task-ish thing, but with Sovereign AI as a good test case for thought experiments. I don't know when Yudkowsky and others updated (and still don't).
I think you should (if you haven't already) update more towards the view I have of Eliezer, that he is often quite seriously wrong and/or overconfident
I'm still not sure where you're getting this? I mean, there are places where I would disagree with Yudkowsky's action-stances or something. For example, I kinda get the sense that he's planning "as though" he has confident short timelines; I don't think confident short timelines make sense, but that's different from how an individual makes their plans. For example, I'm working almost exclusively on plans that only pay off after multiple decades, which looks like "very confident of long timelines", but I'm not actually very confident of that and I say so...
For example, the rejection of HIA you quoted again seems pretty tepid, which is to say, quite non-confident, explicitly calling out non-confidence. In practice one can only seriously work on one or two different things, so I wonder if you're incorrectly inferring confidence on his part.
Maybe after doing this update, it becomes more plausible that his third idea (Task AGI, which I guess is the first that you personally came into contact with, and then spent years working towards) was also seriously wrong (or more seriously wrong than you think)?
I think I'm not following; I think Task AI is a bad plan, but that's because it's extremely difficult. I think you're asking me to imagine that it is solved, but that we should have unknown-unknown type certainty about our solution; and I just don't feel like I know how to evaluate that. If there was (by great surprise) some amazing pile of insights that made a safe Task-AGI seem feasible, and that stood up to comprehensive scrutiny (somehow), then it would plausibly be a good plan to actually do. I think you're saying this is somehow overconfident anyway, and maybe I just disagree with that? But it feels hard to disagree with because it's pretty hypothetical. If you're argument is "but other people are overconfident and wrong about their alignment ideas" I think this proves too much? I mean, it seems to prove too much if applied to cryptography, no? Like, you really can make probably-quite-secure systems, though it takes a lot of work and carefulness and stuff, and the guarantees are conditional on certain mathematical conjectures and only apply to some class of attacks. But I mean, the fact that many people with a wide range of abilities could become wrongly overconfident in the security of their hand-rolled system, doesn't prove an expert can't know when their system is secure in certain senses.
you seem to have entirely ignored them
Well, I meant to address them in a sweeping / not very detailed way. Basically I'm saying that they don't seem like the sort of thing that should necessarily in real life prevent one from doing a Task-ish pivotal act. In other words, yes, {governance, the world not trusting MIRI, extreme power concentration} are very serious concerns, but in real life I would pretty plausibly--depending on the specific situation--say "yeah ok you should go ahead anyway". I take your point about takeover-FAI; FWIW I had the impression that takeover-FAI was more like a hypothetical for purposes of design-thinking, like "please notice that your design would be really bad if it were doing a takeover; therefore it's also bad for pivotal-task, because pivotal-task is quite difficult and relies on many of the same things as a hypothetical safe-takeover-FAI".
This may seem plausible because new evidence about the technical difficulty of alignment was the main reason MIRI pivoted away from their plan, but I want to argue that actually even without this information, there were good enough arguments back then to conclude that the plan was bad
I think it's not totally implausible that one could have called correctly well in advance that the problem itself would be too hard, without actually seeing much evidence that results from MIRI's and other attempts. I think one could consider things like "the mind is very complex and illegible" and "you have to have a good grasp on the sources of capabilities because of reflective self-modification" and "we have no idea what values are or how intelligence really works", and maybe get justified confidence, I'm not sure.
But it seems like you're not arguing that in this post, but instead saying that it was a bad plan even if alignment was easy? I don't think that's right, given the stakes and given the difficulty of all plausible plans. I think you can do the thing of trying to solve it, and not be overconfident, and if you do solve it then you use it to end acute risk, but if you don't solve it you don't build it. (And indeed IIUC much of the MIRI researcher blob pivoted to other things due to alignment difficulty.) If hypothetically you had a real solution, and triple quadruple checked everything, and did a sane and moral process to work out governance, then I think I'd want the plan to be executed, including "burn all the GPUs" or similar.
made it very difficult for voices calling for AI pause/stop to attract attention and resources.
This isn't implausible, but could you point to instances / evidence of this? (I.e. that MIRI's plan / other participation caused this.)
My argument, though, is that it is still very possible for the difficulty of alignment to be in the Apollo regime, and that we haven't received much evidence to rule that regime out (I am somewhat skeptical of a P vs. NP level of difficulty, though I think it could be close to that).
Are you skeptical of PvNP-level due to priors or due to evidence? Why those priors / what evidence?
(I think alignment is pretty likely to be much harder than PvNP. Mainly this is because alignment is very very difficult. (Though also note that PvNP has a maybe-possibly-workable approach, https://en.wikipedia.org/wiki/Geometric_complexity_theory, which its creator states might take a mere one century, though I presume that's not a serious specific estimate.))
Maybe the real issue is we don't know what AGI will be like, so we can't do science on it yet. Like pre-LLM alignment research, we're pretty clueless.
(This is my position, FWIW. We can ~know some things, e.g. convergent instrumental goals are very likely to either pursued, or be obsoleted by some even more powerful plan. E.g. highly capable agents will hack into lots of computers to run themselves--or maybe manufacture new computer chips--or maybe invent some surprising way of doing lots of computation cheaply.)
(Also just recording that I appreciate the OP and these threads, and people finding historical info. I think the topic of how "we" have been going wrong on strategy is important. I'm participating because I'm interested, though my contributions may not be very helpful because