There's a lot of intellectual meat in this story that's interesting. But, my first comment was: "I'm finding myself surprisingly impressed about some aesthetic/stylistic choices here, which I'm surprised I haven't seen before in AI Takeoff Fiction."
In normal english phrasing across multiple paragraphs, there's a sort of rise-and-fall of tension. You establish a minor conflict, confusion, or an open loop of curiosity, and then something happens that resolves it a bit. This isn't just about the content of 'what happens', but also what sort of phrasing one us... (read more)
Curated.
I found this a surprisingly obvious set of strategic considerations (and meta-considerations), that for some reason I'd never seen anyone actually attempt to tackle before.
I found the notion of practicing "no cost too large" periods quite interesting. I'm somewhat intimidated by the prospect of trying it out, but it does seem like a good idea.
Seems true, but also didn't seem to be what this post was about?
On the meta-side: an update I made writing this comment is that inline-google-doc-style commenting is pretty important. It allows you to tag a specific part of the post and say "hey, these seems wrong/confused" without making that big a deal about it, whereas writing a LW comment you sort of have to establish the context which intrinsically means making into A Thing.
(I tried writing up comments here as if I were commenting on a google doc, rather than a LW post, as part of an experiment I had talked about with AdamShimi. I found that actually it was fairly hard – both because I couldn't make quick comments on. a given section without it feeling like a bigger-deal than I meant it to be, and also because the overall thing came out more critical feeling than feels right on a public post. This is ironic since I was the the one who told Adam "I bet if you just ask people to comment on it as if it's a google doc it'll go fi... (read more)
I had formed an impression that the hope was that the big chain of short thinkers would in fact do a good enough job factoring their goals that it would end up comparable to one human thinking for a long time (and that Ought was founded to test that hypothesis)
Curated.
This post laid out some important arguments pretty clearly.
I think there are a number for features LW could build to improve this situation, but first curious for more detail on “what feels wrong about explicitly asking individuals for feedback after posting on AF” similar to how you might ask for feedback on a gDoc?
Okay, so now having thought about this a bit...
I at first read this and was like "I'm confused – isn't this what the whole agent foundations agenda is for? Like, I know there are still kinks to work out, and some of this kinks are major epistemological problems. But... I thought this specific problem was not actually that confusing anymore."
"Don't have your AGI go off and do stupid things" is a hard problem, but it seemed basically to be restating "the alignment problem is hard, for lots of finnicky confusing reasons."
Then I realized "holy christ most AGI ... (read more)
Yeah I'm interested in chatting about this.
I feel I should disclaim "much of what I'd have to say about this is a watered down version of whatever Andrew Critch would say". He's busy a lot, but if you haven't chatted with him about this yet you probably should, and if you have I'm not sure whether I'll have much to add.
But I am pretty interested right now in fleshing out my own coordination principles and fleshing out my understanding of how they scale up from "200 human rationalists" to 1000-10,000 sized coalitions to All Humanity and to AGI and beyond. I'm currently working on a sequence that could benefit from chatting with other people who think seriously about this.
I was confused about this post, and... I might have resolve my confusion by the time I got ready to write this comment. Unsure. Here goes:
My first* thought:
Am I not just allowed to precommit to "be the sort of person who always figures out whatever the optimal game theory was, and commit to that?". I thought that was the point.
i.e. I wouldn't precommit to treating either the Nash Bargaining Solution or Kalai-Smorodinsky Solution as "the permanent grim trigger bullying point", I'd precommit to something like "have a meta-policy of not giving int... (read more)
I think I have juuust enough background to follow the broad strokes of this post, but not to quite grok the parts I think Abram was most interested in.
I definitely caused me to think about credit assignment. I actually ended up thinking about it largely through the lens of Moral Mazes (where challenges of credit assignment combine with other forces to create a really bad environment). Re-reading this post, while I don't quite follow everything, I do successfully get a taste of how credit assignment fits into a bunch of different domains.
For the "myop... (read more)
This feels like an important question in Robust Agency and Group Rationality, which are major topics of my interest.
This post feels probably important but I don't know that I actually understood it or used it enough to feel right nominating it myself. But, bumping it a bit to encourage others to look into it.
This post is a great tutorial on how to run a research group.
My main complain about it is that it had the potential to be a way more general post that was obviously relevant to anyone building a serious intellectual community, but the framing makes it feel only relevant to Alignment research.
Curated, for several reasons.
I think it's really hard to figure out how to help with beneficial AI. Various career and research paths vary in how likely they are to help, or harm, or fit together. I think many prominent thinkers in the AI landscape have developed nuanced takes on how to think about the evolving landscape, but often haven't written up those thoughts.
I like this post both for laying out a lot of object-level thoughts about that, and also for demonstrating a possible framework for organizing those object-level thoughts, and for doing it... (read more)
Curated. This post does a good job of summarizing a lot of complex material, in a (moderately) accessible fashion.
I'm assuming part of the point is the LW crosspost still buries things in a hard-to-navigate google doc, which prevents it from easily getting cited or going viral, and Ajeya is asking/hoping for trust that they can get the benefit of some additional review from a wider variety of sources.
Curated.
I think this was a quite interesting experiment in LW Post format. Getting to see everyone's probability-distributions in visual graph format felt very different from looking at a bunch of numbers in a list, or seeing them averaged together. I especially liked some of the weirder shapes of some people's curves.
This is a bit of an offbeat curation, but I think it's good to periodically highlight experimental formats like this.
Am I correct that the real generating rule here is something like "I have a group of people who'd like to work on some alignment open problems, and want a problem that is a) easy to give my group, and b) easy to subdivide once given to my group?"
Fwiw I recently listened to the excellent song 'The Good Doctor' which has me quite delighted to get random megaman references.
(Flagging that I curated the post, but was mostly relying on Ben and Habryka's judgment, in part since I didn't see much disagreement. Since this discussion I've become more agnostic about how important this post is)
One thing this comment makes me want is more nuanced reacts that people have affordance to communicate how they feel about a post, in a way that's easier to aggregate.
Though I also notice that with this particular post it's a bit unclear what the react would be appropriate, since it sounds like it's not "disagree" so much as "this post seems confused" or something.
The thing I meant by "catastrophic" is "leading to the death of the organism."
This doesn't seem like what it should mean here. I'd think catastrophic in the context of "how humans (programmed by evolution) might fail by evolution's standards" should mean "start pursuing strategies that don't result in many children or longterm population success." (where premature death of the organism might be one way to cause that, but not the only way)
Curated. [Edit: no longer particularly endorsed in light of Rohin's comment, although I also have not yet really vetted Rohin's comment either and currently am agnostic on how important this post is]
When I first started following LessWrong, I thought the sequences made a good theoretical case for the difficulties of AI Alignment. In the past few years we've seen more concrete, empirical examples of how AI progress can take shape and how that might be alarming. We've also seen more concrete simple examples of AI failure in the form of specification gaming a... (read more)
I'm wondering if the Rainforest thing is somehow tied to some other disagreements (between you/me or you/MIRI-cluster).
Where, something like "the fact that it requires some interpretive labor to model the Rainforest as an agent in the first place" is related to why it seems hard to be helpful to humans, i.e. humans aren't actually agents. You get an easier starting ground since we have the ability to write down goals and notice inconsistencies in them, but that's not actually that reliable. We are not in fact agents and we need to somehow build AIs that reliable seem good to us anyway.
(Curious if this feels relevant either to Rohin, or other "MIRI cluster" folk)
I think previously I read this partway through, and assumed it was long, and then stopped for some reason. Now I finally read it and found it a nice, short/sweet post.
I personally did find the Rainforest example fairly compelling. At first glance I think it feels a bit nonsensical to try to "help" a rainforest. But, I'm kinda worried that it'll turn out that it's not (much) less nonsensical to try to help a human, and figuring out how to help arbitrary non-obviously-agenty systems seems like it might be the sort of thing we have to understand.
That's an interesting point.
Curated.
I personally agree with the OP, and have found at least the US's response to Covid-19 fairly important for modeling how it might respond to AI. I also found it particularly interesting that it focused on the "Slow Takeoff" scenario. I wouldn't have thought to make that specific comparison, and found it surprisingly apt.
I also think that, regardless of whether one agrees with the OP, I think "how humanity collectively responded to Covid-19" is still important evidence in some form about how we can expect them to handle other catastrophes, and worth paying attention to, and perhaps debating.
Are you saying you think that wasn't a fair characterization of the FDA, or that the hypothetical AI Governance bodies would be different from the FDA?
(The statement was certainly not very fair to the FDA, and I do expect there was more going on under the hood than that motivation. But, I do broadly think governing bodies do what they are incentivized to do, which includes justifying themselves, especially after being around a couple decades and gradually being infiltrated by careerists)
I do definitely expect different institutional failure in the case of Soft Takeoff. But it sort of depends on what level of abstraction you're looking at the institutional failure through. Like, the FDA won't be involved. But there's a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of "Wield the one hammer we know how to wield to justify our jobs." (In a large company, it's possible that regulatory body could be a department inside the org, rather than a government agency)
In reasonably good outcomes
... (read more)Ah, okay. I think I need to at least think a bit harder to figure out if I still disagree in that case.
I think given that we didn't suppress COVID, mitigating its damage probably involved new problems that we didn't have solutions for before.
Hmm. This just doesn't seem like what was going on to me at all. I think I disagree a lot about this, and it seems less about "how things will shake out in Slow AI Takeoff" and more about "how badly and obviously-in-advance and easily-preventably did we screw up our covid response."
(I expect we also disagree about how Slow Takeoff would look, but I don't think that's the cruxy bit for me here).
I'm sort of hesitant
... (read more)... (read more)1. Many new problems arose during this pandemic for which we did not have historical experience, e.g. in supply chains. (Perhaps we had historical precedent in the Spanish flu, but that was sufficiently long ago that I don’t expect those lessons to generalize, or for us to remember those lessons.) In contrast, I expect that with AI alignment the problems will not change much as the AI systems become more powerful. Certainly the effects of misaligned powerful AI systems will change dramatically and be harder to mitigate, but I expect the underlying causes o
(serious question, I'm not sure what the right process here is)
What do you think should happen instead of "read through and object to Wei_Dai's existing blogposts?". Is there a different process that would work better? Or you think this generally isn't worth the time? Or you think Wei Dai should write a blogpost that more clearly passes your "sniff test" of "probably compelling enough to be worth more of my attention?"
Mostly "Wei Dai should write a blogpost that more clearly passes your "sniff test" of "probably compelling enough to be worth more of my attention"". And ideally a whole sequence or a paper.
It's possible that Wei has already done this, and that I just haven't noticed. But I had a quick look at a few of the blog posts linked in the "Disjunctive scenarios" post, and they seem to overall be pretty short and non-concrete, even for blog posts. Also, there are literally thirty items on the list, which makes it ha... (read more)
(note, this comment is kinda grumpy but, to be clear, comes from the context of me generally quite respecting you as a writer. :P)
I can't remember if I've complained about this elsewhere, but I have no idea what you mean by myopia, and I was about to comment (on another post) asking if you could write a post that succinctly defined what you meant by myopia (or if the point is that it's hard to define, say that explicitly and give a few short attempted descriptions that could help me triangulate it).
Then I searched to see if you'd already done that, and fou
... (read more)Sorry for somehow missing/ignoring this comment for about 5 months. The short answer is that I've been treating "myopia" as a focusing object, and am likely to think any definitions (including my own definitions in the OP) are too hasty and don't capture everything I want to point at. In fact I initially tried to use the new term "partial agency" to make sure people didn't think I was talking about more well-defined versions.
My attempt to give others a handle for the same focusing object was in the first post of the seque... (read more)
Pedagogical note: something that feels like it's missing from the fable is a "realistic" sense of how demons get created and how they can manipulate the hill.
Fortunately your subsequent real-world examples all have this, and, like, I did know what you meant. But it felt sort of arbitrary to have this combo of "Well, there's a very concrete, visceral example of the ball rolling downhill – I know what that means. But then there are some entities that can arbitrarily shape the hill. Why are the demons weak at the beginning and stronger the more you fold
... (read more)Which you could round off to "biologists don't need to know about evolution", in the sense that it is not the best use of their time.
The most obvious thing is understanding why overuse of antibiotics might weaken the effect of antibiotics.
I guess the main thing I want is an actual tally on "how many people definitively found this post to represent their crux", vs "how many people think that this represented other people's cruxes"
Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I'm not sure he qualifies as a "miri person")
I just wanted to flag that this post hasn't been reviewed yet, despite being one of the most nominated posts. (And mosts of the nominations here are quite short).
The most obvious sort of review that'd be good to see is from people who were in this post's target demographic (i.e. people who hadn't understood or been unpersuaded about what sort of problem MIRI is trying to solve), about whether this post actually helped them understand that.
I'd also be interested in reviews that grapple a bit more with "how well exactly does this metaphor hold up?", al
... (read more)At the time I began writing this previous comment, I felt like I hadn't directly gotten that much use of this post. But then after reflecting a bit about Beyond Astronomical Waste I realized this had actually been a fairly important concept in some of my other thinking.
I think that at the time this post came out, I didn't have the mental scaffolding necessary to really engage with it – I thought of this question as maybe important, but sort of "above my paygrade", something better left to other people who would have the resources to engage more seriously with it.
But, over the past couple years, the concepts here have formed an important component of my understanding of robust agency. Much of this came from private in-person conversations, but this post is the best writeup of the concept I'm cur... (read more)
(5 upvotes from a few AF users suggests this post probably should be nominated by an additional AF person, but unsure. I do apologize again for not having better nomination-endorsement-UI.
I think this post may have been relevant to my own thinking, but I'm particularly interested in how relevant the concept has been to other people who think professionally about alignment)
A reminder, since this looks like it has a few upvotes from AF users: posts need 2 nominations to proceed to the review round.
I'm not sure I understand the difference between this worldview and my own. (The phrase-in-italics in your comment seemed fairly integral to how I was thinking about alignment/capabilities in the first place).
This recent comment of yours seems more relevant as far as worldview differences go, i.e. 'if you expect discontinuous takeoff, then transparency is unlikely to do what you want'. (some slightly more vague "what counts as a clever argument" disagreement might be relevant too, although I'm not sure I can state my worry cr... (read more)
Sort of a side point, but something that's been helpful to me in this post and others in the past year is reconceptualizing the Fast/Slow takeoff into "Continuous" vs "Hard" takeoff, which suggest different strategic considerations. This particular post helped flesh out some of my models of what considerations are at play.
Is it a correct summary of the final point: "either this doesn't really impact the field, so it doesn't increase capabilities; or, it successfully moves the ML field from 'everything is opaque ... (read more)
Yep, I think that's a correct summary of the final point.
The main counterpoint that comes to mind is a possible world where "opaque AIs" just can't ever achieve general intelligence, but moderately well-thought-out AI designs can bridge the gap to "general intelligence/agency" without being reliable enough to be aligned.
Well, we know it's possible to achieve general intelligence via dumb black box search—evolution did it—and we've got lots of evidence for current black box approaches being quite powerful. So it seems unlikely to me that we "just can't
... (read more)I'm not sure what'll end up settling with for "regular Open Threads" vs shortform. Open Threads predate shortform, but didn't create the particular feeling of a person-space like shortform does, so it seemed useful to add shortform. I'm not sure if Open Threads still provide a particular service that shortform doesn't provide.
In _this_ case, however, I think the Alignment Open Thread servers a bit of a different purpose – it's a place to spark low-key conversation between AF members. (Non-AF members can c... (read more)
I'd bet $5 this was intentional.
Planning of vengeance continues apace, either way.
I'd lean towards just encouraging people to comment on the original post (although if there end up being any significant number of comments here I agree with your suggestion)
Curated. I appreciated this post for a combination of:
I also wanted to highlight this section:
... (read more)