Curated. I've heard a few offhand comments about this type of research work in the past few months, but wasn't quite sure how seriously to take it.
I like this writeup for spelling out details of why it blackbox investigators might be useful, what skills it requires and how you might go about it.
I expect this sort of skillset to have major limitations, but I think I agree with the stated claims that it's a useful skillset to have in conjunction with other techniques.
GPT-N that you can prompt with "I am stuck with this transformer architecture trying to solve problem X". GPT-N would be AIHHAI if it answers along the lines of "In this arXiv article, they used trick Z to solve problems similar to X. Have you considered implementing it?", and using an implementation of Z would solve X >50% of the time.
I haven't finished reading the post, but I found it worthwhile for this quote alone. This is the first description I've read of how GPT-N could be transformative. (Upon reflection this was super obvious and I'm embarrasse... (read more)
Well yeah, that's my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category.
Yeah. I think this sort of thing is why Eliezer thinks we're doomed – getting the humanity to coordinate collectively seems doomed (i.e. see Gain of Function Research), and there are no weak pivotal acts that aren't basically impossible to execute safely.
The nanomachine gpu-melting... (read more)
Followup point on the Gain-of-Function-Ban as practice-run for AI:
My sense is that the biorisk people who were thinking about Gain-of-Function-Ban were not primarily modeling it as a practice run for regulating AGI. This may result in them not really prioritizing it.
I think biorisk is significantly lower than AGI risk, so if it's tractable and useful to regulate Gain of Function research as a practice run for regulating AGI, it's plausible this is actually much more important than business-as-usual biorisk.
BUT I think smart people I know seem to disa... (read more)
Various thoughts that this inspires:
Gain of Function Ban as Practice-Run/Learning for relevant AI Bans
I have heard vague-musings-of-plans in the direction of "get the world to successfully ban Gain of Function research, as a practice-case for getting the world to successfully ban dangerous AI."
I have vague memories of the actual top bio people around not being too focused on this, because they thought there were easier ways to make progress on biosecurity. (I may be conflating a few different statements – they might have just critiquing a particular ... (read more)
+1 to the distinction between "Regulating AI is possible/impossible" vs "pivotal act framing is harmful/unharmful".
I'm sympathetic to a view that says something like "yeah, regulating AI is Hard, but it's also necessary because a unilateral pivotal act would be Bad". (TBC, I'm not saying I agree with that view, but it's at least coherent and not obviously incompatible with how the world actually works.) To properly make that case, one has to argue some combination of:
Also, like, Berkeley heat waves may just significantly different than, like, Reno heat waves. My current read is that part of the issue here is that a lot of places don't actually get that hot so having less robustly good air conditioners is fine.
Not sure I buy this – I have a model of how hard it is to be deceptive, and how competent our current ML systems are, and it looks like it's more like "as competent as a deceptive four-year old" (my parents totally caught me when I told my first lie), than "as competent as a silver-tongued sociopath playing a long game."
I do expect there to be signs of deceptive alignment, in a noticeable fashion before we get so-deceptive-we-don't-notice deception.
I've long had a vague sense that interpretability should be helpful somehow, but recently when I tried to spell out exactly how it helped I had a surprisingly hard time. I appreciated this post's exploration of the concept.
I've had on my TODO to try reading the LW post transcript of that and seeing if it could be distilled further.
As someone kinda confused about InfraBayesianism but has some sense that it's important, I am glad to see this initiative. :)
To what extent do you think pivotal-acts-in-particular are strategically important (i.e. "successfully do a pivotal act, and if necessary build an AGI to do it" is the primary driving goal), vs "pivotal acts are useful shorthand to refer to the kind of intelligence level where it matters than an AGI be 'really safe'".
I'm interested in particular in responses from Eliezer, Rohin, and perhaps Richard Ngo. (I've had private chats with Rohin that I thought were useful to share and this comment is sort of creating a framing device for sharing them, but I've bee... (read more)
My Eliezer-model thinks pivotal acts are genuinely, for-real, actually important. Like, he's not being metaphorical or making a pedagogical point when he says (paraphrasing) 'we need to use the first AGI systems to execute a huge, disruptive, game-board-flipping action, or we're all dead'.
When my Eliezer-model says that the most plausible pivotal acts he's aware of involve capabilities roughly at the level of 'develop nanotech' or 'put two cellular-identical strawberries on a plate', he's being completely literal. If some significantly weaker capability le... (read more)
Curated. I found the entire sequence of conversations quite valuable, and it seemed good both to let people know it had wrapped up, and curate it while the AMA was still going on.
Thanks. I wasn't super satisfied with the way I phrased my questions. I just made some slight edits to them (labeled as such), although they still don't feel like they quite do the thing. (I feel like I'm looking at a bunch of subtle frame disconnects, while multiple other frame disconnects are going on, so pinpointing the thing is hard_
I think "is any of this actually cruxy" is maybe the most important question and I should have included it. You answered "not supermuch, at least compared to models of intelligence". Do you think there's any similar nearby ... (read more)
[I think there's a thing Eliezer does a lot, which I have mixed feelings about, which is matching people's statements to patterns and then responding to the generator of the pattern in Eliezer's head, which only sometimes corresponds to the generator in the other person's head.]
I want to add an additional meta-pattern – there was a once a person who thought I had a particular bias. They'd go around telling me "Ray, you're exhibiting that bias right now. Whatever rationalization you're coming up with right now, it's not the real reason you're arguing X." An... (read more)
It seems to me that a major crux about AI strategy routes through "is civilization generally adequate or not?". It seems like people have pretty different intuitions and ontologies here. Here's an attempt at some questions of varying levels of concreteness, to tease out some worldview implications.
(I normally use the phrase "civilizational adequacy", but I think that's kinda a technical term that means a specific thing and I think maybe I'm pointing at a broader concept.)
"Does civilization generally behave sensibly?" This is a vague question, some po... (read more)
I don't think this is the main crux -- disagreements about mechanisms of intelligence seem far more important -- but to answer the questions:
Do you think major AI orgs will realize that AI is potentially worldendingly dangerous, and have any kind of process at all to handle that?
Clearly yes? They have safety teams that are focused on x-risk? I suspect I have misunderstood your question.
(Maybe you mean the bigger tech companies like FAANG, in which case I'm still at > 95% on yes, but I suspect I am still misunderstanding your question.)
(I know less about... (read more)
There's something I had interpreted the original CEV paper to be implying, but wasn't sure if it was still part of the strategic landscape, which was "have the alignment project being working towards a goal that was highly visibly fair, to disincentive races." Was that an intentional part of the goal, or was it just that CEV seemed something like "the right thing to do" (independent of it's impact on races?)
How does Eliezer think about it now?
Yes, it was an intentional part of the goal.
If there were any possibility of surviving the first AGI built, then it would be nice to have AGI projects promising to do something that wouldn't look like trying to seize control of the Future for themselves, when, much later (subjectively?), they became able to do something like CEV. I don't see much evidence that they're able to think on the level of abstraction that CEV was stated on, though, nor that they're able to understand the 'seizing control of the Future' failure mode that CEV is meant to preve... (read more)
I'm currently working to form my own models here. I'm not sure if this post concretely helped me but it's nice to see other people grappling with it.
One thing I notice is that this post is sort of focused on "developing inside views as a researcher, so you can do research." But an additional lens here is "Have an inside view so you can give advice to other people, or do useful community building, or build useful meta-tools for research, or fund research.
In my case I already feel like I have a solid inside view of "AGI is important", and "timelines might be... (read more)
Here is a link to the transcript, which includes ability to watch along with the video.https://www.rev.com/transcript-editor/shared/QmH6Ofy5AXbQ4siBlLNcvUnMkMBj3qa4WIkQtGeoOlo4K3DvjOH3oMUJuIAUBrJiJkJbb4VU3uqWhLLwRu19f3m6gag?loadFrom=SharedLink
I've put in a request for a transcript.
Are there already plans for a transcript of this? (I could set in motion of a rev.com transcription)
[Ngo][15:20] Again, this is an argument that I believed less after looking into the details, because right now it's pretty difficult to throw more compute at neural networks at runtime.Which is not to say that it's a bad argument, the differences in compute-scalability between humans and AIs are clearly important. But I'm confused about the structure of your argument that knowing more details will predictably update me in a certain direction.[Yudkowsky][15:21] I suppose the genericized version of my actual response to that would be, "architectu
Again, this is an argument that I believed less after looking into the details, because right now it's pretty difficult to throw more compute at neural networks at runtime.
Which is not to say that it's a bad argument, the differences in compute-scalability between humans and AIs are clearly important. But I'm confused about the structure of your argument that knowing more details will predictably update me in a certain direction.
I suppose the genericized version of my actual response to that would be, "architectu
I haven't had time to reread this sequence in depth, but I wanted to at least touch on how I'd evaluate it. It seems to be aiming to be both a good introductory sequence, while being a "complete and compelling case I can for why the development of AGI might pose an existential threat".
The question is who is this sequence for, what is it's goal, and how does it compare to other writing targeting similar demographics.
Some writing that comes to mind to compare/contrast it with includes:
It strikes me that this post looks like a (AFAICT?) a stepping stone towards the Eliciting Latent Knowledge research agenda, which currently has a lot of support/traction. Which makes this post fairly historically important.
I've highly voted this post for a few reasons.
First, this post contains a bunch of other individual ideas I've found quite helpful for orienting. Some examples:
But my primary reason was learning Critch's views on what research fields are promising, and how they fit into his worldview. I'm not sure if I agree with Critch, but I think "Figur... (read more)
A year later, as we consider this for the 2020 Review, I think figuring out a better name is worth another look.
Another option is "AI Catastrophe from First Principles"
Curated. EfficientZero seems like an important advance, and I appreciate this post's length explanation, broken into sections that made it easy to skim past parts I already understood.
Curated. This post matched my own models of how folk tend to get into independent alignment research, and I've seen some people whose models I trust more endorse the post as well. Scaling good independent alignment research seems very important.
I do like that the post also specifies who shouldn't be going to independent research.
So... I totally think there are people who sort of nod along with Paul, using it as an excuse to believe in a rosier world where things are more comprehensible and they can imagine themselves doing useful things without having a plan for solving the actual hard problems. Those types of people exist. I think there's some important work to be done in confronting them with the hard problem at hand.
But, also... Paul's world AFAICT isn't actually rosier. It's potentially more frightening to me. In Smooth Takeoff world, you can't carefully plan your pivotal act ... (read more)
Update: I originally posted this question over here, then realized this post existed and maybe I should just post the question here. But then it turned out people had already started answering my question-post, so, I am declaring that the canonical place to answer the question.
Can someone give a rough explanation of how this compares to the recent Deepmind atari-playing AI:
And, for that matter, how both of them compare to the older deepmind paper:
Are they accomplishing qualitatively different things? The same thing but better?
Curated. I don't think we've curated an AMA before, and not sure if I have a principled opinion on doing that, but this post seems chock full of small useful incites, and fragments of ideas that seem like they might otherwise take awhile to get written up more comprehensively, which I think is good.
Curated. I appreciated this post for a combination of:
I also wanted to highlight this section:
Finally, should also mention that I agree with Tom Dietterich’s view (dietterich2019robust) that we should make AI safer to society by learning from high-reliability organiz
There's a lot of intellectual meat in this story that's interesting. But, my first comment was: "I'm finding myself surprisingly impressed about some aesthetic/stylistic choices here, which I'm surprised I haven't seen before in AI Takeoff Fiction."
In normal english phrasing across multiple paragraphs, there's a sort of rise-and-fall of tension. You establish a minor conflict, confusion, or an open loop of curiosity, and then something happens that resolves it a bit. This isn't just about the content of 'what happens', but also what sort of phrasing one us... (read more)
I found this a surprisingly obvious set of strategic considerations (and meta-considerations), that for some reason I'd never seen anyone actually attempt to tackle before.
I found the notion of practicing "no cost too large" periods quite interesting. I'm somewhat intimidated by the prospect of trying it out, but it does seem like a good idea.
Seems true, but also didn't seem to be what this post was about?
On the meta-side: an update I made writing this comment is that inline-google-doc-style commenting is pretty important. It allows you to tag a specific part of the post and say "hey, these seems wrong/confused" without making that big a deal about it, whereas writing a LW comment you sort of have to establish the context which intrinsically means making into A Thing.
(I tried writing up comments here as if I were commenting on a google doc, rather than a LW post, as part of an experiment I had talked about with AdamShimi. I found that actually it was fairly hard – both because I couldn't make quick comments on. a given section without it feeling like a bigger-deal than I meant it to be, and also because the overall thing came out more critical feeling than feels right on a public post. This is ironic since I was the the one who told Adam "I bet if you just ask people to comment on it as if it's a google doc it'll go fi... (read more)
I had formed an impression that the hope was that the big chain of short thinkers would in fact do a good enough job factoring their goals that it would end up comparable to one human thinking for a long time (and that Ought was founded to test that hypothesis)
This post laid out some important arguments pretty clearly.
I think there are a number for features LW could build to improve this situation, but first curious for more detail on “what feels wrong about explicitly asking individuals for feedback after posting on AF” similar to how you might ask for feedback on a gDoc?
Okay, so now having thought about this a bit...
I at first read this and was like "I'm confused – isn't this what the whole agent foundations agenda is for? Like, I know there are still kinks to work out, and some of this kinks are major epistemological problems. But... I thought this specific problem was not actually that confusing anymore."
"Don't have your AGI go off and do stupid things" is a hard problem, but it seemed basically to be restating "the alignment problem is hard, for lots of finnicky confusing reasons."
Then I realized "holy christ most AGI ... (read more)
Yeah I'm interested in chatting about this.
I feel I should disclaim "much of what I'd have to say about this is a watered down version of whatever Andrew Critch would say". He's busy a lot, but if you haven't chatted with him about this yet you probably should, and if you have I'm not sure whether I'll have much to add.
But I am pretty interested right now in fleshing out my own coordination principles and fleshing out my understanding of how they scale up from "200 human rationalists" to 1000-10,000 sized coalitions to All Humanity and to AGI and beyond. I'm currently working on a sequence that could benefit from chatting with other people who think seriously about this.
I was confused about this post, and... I might have resolve my confusion by the time I got ready to write this comment. Unsure. Here goes:
My first* thought:
Am I not just allowed to precommit to "be the sort of person who always figures out whatever the optimal game theory was, and commit to that?". I thought that was the point.
i.e. I wouldn't precommit to treating either the Nash Bargaining Solution or Kalai-Smorodinsky Solution as "the permanent grim trigger bullying point", I'd precommit to something like "have a meta-policy of not giving int... (read more)
I think I have juuust enough background to follow the broad strokes of this post, but not to quite grok the parts I think Abram was most interested in.
I definitely caused me to think about credit assignment. I actually ended up thinking about it largely through the lens of Moral Mazes (where challenges of credit assignment combine with other forces to create a really bad environment). Re-reading this post, while I don't quite follow everything, I do successfully get a taste of how credit assignment fits into a bunch of different domains.
For the "myop... (read more)
This feels like an important question in Robust Agency and Group Rationality, which are major topics of my interest.
This post feels probably important but I don't know that I actually understood it or used it enough to feel right nominating it myself. But, bumping it a bit to encourage others to look into it.
This post is a great tutorial on how to run a research group.
My main complain about it is that it had the potential to be a way more general post that was obviously relevant to anyone building a serious intellectual community, but the framing makes it feel only relevant to Alignment research.
Curated, for several reasons.
I think it's really hard to figure out how to help with beneficial AI. Various career and research paths vary in how likely they are to help, or harm, or fit together. I think many prominent thinkers in the AI landscape have developed nuanced takes on how to think about the evolving landscape, but often haven't written up those thoughts.
I like this post both for laying out a lot of object-level thoughts about that, and also for demonstrating a possible framework for organizing those object-level thoughts, and for doing it... (read more)
Curated. This post does a good job of summarizing a lot of complex material, in a (moderately) accessible fashion.
I'm assuming part of the point is the LW crosspost still buries things in a hard-to-navigate google doc, which prevents it from easily getting cited or going viral, and Ajeya is asking/hoping for trust that they can get the benefit of some additional review from a wider variety of sources.