All of Raemon's Comments + Replies

AMA: Paul Christiano, alignment researcher

Curated. I don't think we've curated an AMA before, and not sure if I have a principled opinion on doing that, but this post seems chock full of small useful incites, and fragments of ideas that seem like they might otherwise take awhile to get written up more comprehensively, which I think is good.

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Curated. I appreciated this post for a combination of:

  • laying out several concrete stories about how AI could lead to human extinction
  • layout out a frame for how think about those stories (while acknowledging other frames one could apply to the story)
  • linking to a variety of research, with more thoughts what sort of further research might be helpful.

I also wanted to highlight this section:

Finally, should also mention that I agree with Tom Dietterich’s view (dietterich2019robust) that we should make AI safer to society by learning from high-reliability organiz

... (read more)
Another (outer) alignment failure story

There's a lot of intellectual meat in this story that's interesting. But, my first comment was: "I'm finding myself surprisingly impressed about some aesthetic/stylistic choices here, which I'm surprised I haven't seen before in AI Takeoff Fiction."

In normal english phrasing across multiple paragraphs, there's a sort of rise-and-fall of tension. You establish a minor conflict, confusion, or an open loop of curiosity, and then something happens that resolves it a bit. This isn't just about the content of 'what happens', but also what sort of phrasing one us... (read more)

How do we prepare for final crunch time?


I found this a surprisingly obvious set of strategic considerations (and meta-considerations), that for some reason I'd never seen anyone actually attempt to tackle before.

I found the notion of practicing "no cost too large" periods quite interesting. I'm somewhat intimidated by the prospect of trying it out, but it does seem like a good idea.

How do we prepare for final crunch time?

Seems true, but also didn't seem to be what this post was about?

Epistemological Framing for AI Alignment Research

On the meta-side: an update I made writing this comment is that inline-google-doc-style commenting is pretty important. It allows you to tag a specific part of the post and say "hey, these seems wrong/confused" without making that big a deal about it, whereas writing a LW comment you sort of have to establish the context which intrinsically means making into A Thing.

Epistemological Framing for AI Alignment Research

(I tried writing up comments here as if I were commenting on a google doc, rather than a LW post, as part of an experiment I had talked about with AdamShimi. I found that actually it was fairly hard – both because I couldn't make quick comments on. a given section without it feeling like a bigger-deal than I meant it to be, and also because the overall thing came out more critical feeling than feels right on a public post. This is ironic since I was the the one who told Adam "I bet if you just ask people to comment on it as if it's a google doc it'll go fi... (read more)

1Raymond Arnold7moOn the meta-side: an update I made writing this comment is that inline-google-doc-style commenting is pretty important. It allows you to tag a specific part of the post and say "hey, these seems wrong/confused" without making that big a deal about it, whereas writing a LW comment you sort of have to establish the context which intrinsically means making into A Thing.
The case for aligning narrowly superhuman models

I had formed an impression that the hope was that the big chain of short thinkers would in fact do a good enough job factoring their goals that it would end up comparable to one human thinking for a long time (and that Ought was founded to test that hypothesis)

7Paul Christiano8moThat's what I have in mind. If all goes well you can think of it like "a human thinking a long time." We don't know if all will go well. It's also not really clear what "a human thinking 10,000 years" means, HCH is kind of an operationalization of that, but there's a presumption of alignment in the human-thinking-a-long-time that we don't get for free here. (Of course you also wouldn't get it for free if you somehow let a human live for 10,000 years...)
adamShimi's Shortform

I think there are a number for features LW could build to improve this situation, but first curious for more detail on “what feels wrong about explicitly asking individuals for feedback after posting on AF” similar to how you might ask for feedback on a gDoc?

5Steve Byrnes8moNot Adam, but 1. Maybe there's a sense in which everyone has already implicitly declared that they don't want to give feedback, because they could have if they wanted to, so it feels like more of an imposition. 2. Maybe it feels like "I want feedback for my own personal benefit" when it's already posted, as opposed to "I want feedback to improve this document which I will share with the community" when it's not yet posted. So it feels more selfish, instead of part of a community project. For that problem, maybe you'd want to frame it as "I'm planning to rewrite this post / write a follow-up to this post / give a talk based on this post / etc., can you please offer feedback on this post to help me with that?" (Assuming that's in fact the case, of course, but most posts have follow-up posts...)
The Commitment Races problem

Okay, so now having thought about this a bit...

I at first read this and was like "I'm confused – isn't this what the whole agent foundations agenda is for? Like, I know there are still kinks to work out, and some of this kinks are major epistemological problems. But... I thought this specific problem was not actually that confusing anymore."

"Don't have your AGI go off and do stupid things" is a hard problem, but it seemed basically to be restating "the alignment problem is hard, for lots of finnicky confusing reasons."

Then I realized "holy christ most AGI ... (read more)

The Commitment Races problem

Yeah I'm interested in chatting about this. 

I feel I should disclaim "much of what I'd have to say about this is a watered down version of whatever Andrew Critch would say". He's busy a lot, but if you haven't chatted with him about this yet you probably should, and if you have I'm not sure whether I'll have much to add.

But I am pretty interested right now in fleshing out my own coordination principles and fleshing out my understanding of how they scale up from "200 human rationalists" to 1000-10,000 sized coalitions to All Humanity and to AGI and beyond. I'm currently working on a sequence that could benefit from chatting with other people who think seriously about this.

The Commitment Races problem

I was confused about this post, and... I might have resolve my confusion by the time I got ready to write this comment. Unsure. Here goes:

My first* thought: 

Am I not just allowed to precommit to "be the sort of person who always figures out whatever the optimal game theory was, and commit to that?". I thought that was the point. 

i.e. I wouldn't precommit to treating either the Nash Bargaining Solution or Kalai-Smorodinsky Solution as "the permanent grim trigger bullying point", I'd precommit to something like "have a meta-policy of not giving int... (read more)

4Daniel Kokotajlo9moThanks! Reading this comment makes me very happy, because it seems like you are now in a similar headspace to me back in the day. Writing this post was my response to being in this headspace. This sounds like a plausibly good rule to me. But that doesn't mean that every AI we build will automatically follow it. Moreover, thinking about acausal trade is in some sense engaging in acausal trade. As I put it: As for your handwavy proposals, I do agree that they are pretty good. They are somewhat similar to the proposals I favor, in fact. But these are just specific proposals in a big space of possible strategies, and (a) we have reason to think there might be flaws in these proposals that we haven't discovered yet, and (b) even if these proposals work perfectly there's still the problem of making sure that our AI follows them: If you want to think and talk more about this, I'd be very interested to hear your thoughts. Unfortunately, while my estimate of the commitment races problem's importance has only increased over the past year, I haven't done much to actually make intellectual progress on it.
The Credit Assignment Problem

I think I have juuust enough background to follow the broad strokes of this post, but not to quite grok the parts I think Abram was most interested in. 

I definitely caused me to think about credit assignment. I actually ended up thinking about it largely through the lens of Moral Mazes (where challenges of credit assignment combine with other forces to create a really bad environment). Re-reading this post, while I don't quite follow everything, I do successfully get a taste of how credit assignment fits into a bunch of different domains.

For the "myop... (read more)

The Commitment Races problem

This feels like an important question in Robust Agency and Group Rationality, which are major topics of my interest.

Why Subagents?

This post feels probably important but I don't know that I actually understood it or used it enough to feel right nominating it myself. But, bumping it a bit to encourage others to look into it.

Alignment Research Field Guide

This post is a great tutorial on how to run a research group. 

My main complain about it is that it had the potential to be a way more general post that was obviously relevant to anyone building a serious intellectual community, but the framing makes it feel only relevant to Alignment research.

Some AI research areas and their relevance to existential safety

Curated, for several reasons.

I think it's really hard to figure out how to help with beneficial AI. Various career and research paths vary in how likely they are to help, or harm, or fit together. I think many prominent thinkers in the AI landscape have developed nuanced takes on how to think about the evolving landscape, but often haven't written up those thoughts. 

I like this post both for laying out a lot of object-level thoughts about that, and also for demonstrating a possible framework for organizing those object-level thoughts, and for doing it... (read more)

The Solomonoff Prior is Malign

Curated. This post does a good job of summarizing a lot of complex material, in a (moderately) accessible fashion.

4Ben Pace1y+1 I already said I liked it, but this post is great and will immediately be the standard resource on this topic. Thank you so much.
Draft report on AI timelines

I'm assuming part of the point is the LW crosspost still buries things in a hard-to-navigate google doc, which prevents it from easily getting cited or going viral, and Ajeya is asking/hoping for trust that they can get the benefit of some additional review from a wider variety of sources.

Forecasting Thread: AI Timelines


I think this was a quite interesting experiment in LW Post format. Getting to see everyone's probability-distributions in visual graph format felt very different from looking at a bunch of numbers in a list, or seeing them averaged together. I especially liked some of the weirder shapes of some people's curves.

This is a bit of an offbeat curation, but I think it's good to periodically highlight experimental formats like this.

What's a Decomposable Alignment Topic?

Am I correct that the real generating rule here is something like "I have a group of people who'd like to work on some alignment open problems, and want a problem that is a) easy to give my group, and b) easy to subdivide once given to my group?"

3elriggs1yb) seems right. I'm unsure what (a) could mean (not much overhead?). I feel confused to think about decomposability w/o considering the capabilities of the people I'm handing the tasks off to. I would only add: since that makes the capabilities explicit.
Will OpenAI's work unintentionally increase existential risks related to AI?

Fwiw I recently listened to the excellent song 'The Good Doctor' which has me quite delighted to get random megaman references.

Matt Botvinick on the spontaneous emergence of learning algorithms

(Flagging that I curated the post, but was mostly relying on Ben and Habryka's judgment, in part since I didn't see much disagreement. Since this discussion I've become more agnostic about how important this post is)

One thing this comment makes me want is more nuanced reacts that people have affordance to communicate how they feel about a post, in a way that's easier to aggregate.

Though I also notice that with this particular post it's a bit unclear what the react would be appropriate, since it sounds like it's not "disagree" so much as "this post seems confused" or something.

2Matthew "Vaniver" Graves1yFWIW, I appreciated that your curation notice explicitly includes the desire for more commentary on the results, and that curating it seems to have been a contributor to there being more commentary.
Matt Botvinick on the spontaneous emergence of learning algorithms

The thing I meant by "catastrophic" is "leading to the death of the organism."

This doesn't seem like what it should mean here. I'd think catastrophic in the context of "how humans (programmed by evolution) might fail by evolution's standards" should mean "start pursuing strategies that don't result in many children or longterm population success." (where premature death of the organism might be one way to cause that, but not the only way)

1Adam Scholl1yI agree, in the case of evolution/humans. In the text above, I meant to highlight what seemed to me like a relative lack of catastrophic *within-mind* inner alignment failures, e.g. due to conflicts between PFC and DA. Death of the organism feels to me like a reasonable way to operationalize "catastrophic" in these cases, but I can imagine other reasonable ways.
Matt Botvinick on the spontaneous emergence of learning algorithms

Curated. [Edit: no longer particularly endorsed in light of Rohin's comment, although I also have not yet really vetted Rohin's comment either and currently am agnostic on how important this post is]

When I first started following LessWrong, I thought the sequences made a good theoretical case for the difficulties of AI Alignment. In the past few years we've seen more concrete, empirical examples of how AI progress can take shape and how that might be alarming. We've also seen more concrete simple examples of AI failure in the form of specification gaming a... (read more)

Our take on CHAI’s research agenda in under 1500 words

I'm wondering if the Rainforest thing is somehow tied to some other disagreements (between you/me or you/MIRI-cluster).

Where, something like "the fact that it requires some interpretive labor to model the Rainforest as an agent in the first place" is related to why it seems hard to be helpful to humans, i.e. humans aren't actually agents. You get an easier starting ground since we have the ability to write down goals and notice inconsistencies in them, but that's not actually that reliable. We are not in fact agents and we need to somehow build AIs that reliable seem good to us anyway.

(Curious if this feels relevant either to Rohin, or other "MIRI cluster" folk)

1Alex Flint1yWell, yes, one way to help some living entity is to (1) interpret it as an agent, and then (2) act in service of the terminal goals of that agent. But that's not the only way to be helpful. It may also be possible to directly be helpful to a living entity that is not an agent, without getting any agent concepts involved at all. I definitely don't know how to do this, but the route that avoids agent models entirely seems more plausible me compared to working hard to interpret everything using some agent model that is often a really poor fit, and then helping on the basis of a that poorly-fitting agent model. I'm excited about inquiring deeply into what the heck "help" means. (All please reach out to me if you'd like to join a study seminar on this topic)
2Rohin Shah1yI share Alex's intuition in a sibling comment: Yes, there is interpretive labor, and yes, things become fuzzy as situations become more and more extreme, but if you want to help an agent-ish thing it shouldn't be too hard to add some value and not cause massive harm. I expect MIRI-cluster to agree with this point -- think of the sentiment "the AI knows what you want it to do, it just doesn't care". The difficulty isn't in being competent enough to help humans, it's in being motivated to help humans. (If you thought that we had to formally define everything and prove theorems w.r.t the formal definitions or else we're doomed, then you might think that the fact that humans aren't clear agents poses a problem; that might be one way that MIRI-cluster and I disagree.) I could imagine that for some specific designs for AI systems you could say that they would fail to help humans because they make a false assumption of too-much-agentiness. If the plan was "literally run an optimal strategy pair for an assistance game (CIRL)", I think that would be a correct critique -- most egregiously, CIRL assumes a fixed reward function, but humans change over time. But I don't see why it would be true for the "default" intelligent AI system.
Our take on CHAI’s research agenda in under 1500 words

I think previously I read this partway through, and assumed it was long, and then stopped for some reason. Now I finally read it and found it a nice, short/sweet post. 

I personally did find the Rainforest example fairly compelling. At first glance I think it feels a bit nonsensical to try to "help" a rainforest. But, I'm kinda worried that it'll turn out that it's not (much) less nonsensical to try to help a human, and figuring out how to help arbitrary non-obviously-agenty systems seems like it might be the sort of thing we have to understand.

1Alex Flint1yYeah this question of what it really means to help some non-pure-agent living entity seems more and more central to me. It also, unfortunately, seems more and more difficult. Another way that I state the question in order to meditate on it is: what does it mean to act out of true love?
Possible takeaways from the coronavirus pandemic for slow AI takeoff


I personally agree with the OP, and have found at least the US's response to Covid-19 fairly important for modeling how it might respond to AI. I also found it particularly interesting that it focused on the "Slow Takeoff" scenario. I wouldn't have thought to make that specific comparison, and found it surprisingly apt. 

I also think that, regardless of whether one agrees with the OP, I think "how humanity collectively responded to Covid-19" is still important evidence in some form about how we can expect them to handle other catastrophes, and worth paying attention to, and perhaps debating.

Possible takeaways from the coronavirus pandemic for slow AI takeoff

Are you saying you think that wasn't a fair characterization of the FDA, or that the hypothetical AI Governance bodies would be different from the FDA?

(The statement was certainly not very fair to the FDA, and I do expect there was more going on under the hood than that motivation. But, I do broadly think governing bodies do what they are incentivized to do, which includes justifying themselves, especially after being around a couple decades and gradually being infiltrated by careerists)

2Rohin Shah1yI am mostly confused, but I expect that if I learned more I would say that it wasn't a fair characterization of the FDA.
Possible takeaways from the coronavirus pandemic for slow AI takeoff

I do definitely expect different institutional failure in the case of Soft Takeoff. But it sort of depends on what level of abstraction you're looking at the institutional failure through. Like, the FDA won't be involved. But there's a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of "Wield the one hammer we know how to wield to justify our jobs." (In a large company, it's possible that regulatory body could be a department inside the org, rather than a government agency)

In reasonably good outcomes

... (read more)
2Rohin Shah1yYeah, these sorts of stories seem possible, and it also seems possible that institutions try some terrible policies, notice that they're terrible, and then fix them. Like, this description: just doesn't seem to match my impression of non-EAs-or-rationalists working on AI governance. It's possible that people in government are much less competent than people at think tanks, but this would be fairly surprising to me. In addition, while I can't explain FDA decisions, I still pretty strongly penalize views that ascribe huge very-consequential-by-their-goals irrationality to small groups of humans working full time on something. (Note I would defend the claim that institutions work well enough that in a slow takeoff world the probability of extinction is < 80%, and probably < 50%, just on the basis that if AI alignment turned out to be impossible, we can coordinate not to build powerful AI.)
Possible takeaways from the coronavirus pandemic for slow AI takeoff

Ah, okay. I think I need to at least think a bit harder to figure out if I still disagree in that case. 

3Raymond Arnold1yI do definitely expect different institutional failure in the case of Soft Takeoff. But it sort of depends on what level of abstraction you're looking at the institutional failure through. Like, the FDA won't be involved. But there's a decent chance that some other regulatory will be involved, which is following the underlying FDA impulse of "Wield the one hammer we know how to wield to justify our jobs." (In a large company, it's possible that regulatory body could be a department inside the org, rather than a government agency) In reasonably good outcomes, the decisions are mostly being made by tech companies full of specialists who well understand the problem. In that case the institutional failures will look more like "what ways do tech companies normally screw up due to internal politics?" There's a decent chance the military or someone will try to commandeer the project, in which case more typical government institutional failures will become more relevant. One thing that seems significant is that 2 years prior to The Big Transition, you'll have multiple companies with similar-ish tech. And some of them will be appropriately cautious (like New Zealand, Singapore), and others will not have the political wherewithal to slow down and think carefully and figure out what inconvenient things they need to do and do them (like many other countries in covid)
Possible takeaways from the coronavirus pandemic for slow AI takeoff

I think given that we didn't suppress COVID, mitigating its damage probably involved new problems that we didn't have solutions for before.

Hmm. This just doesn't seem like what was going on to me at all. I think I disagree a lot about this, and it seems less about "how things will shake out in Slow AI Takeoff" and more about "how badly and obviously-in-advance and easily-preventably did we screw up our covid response."

(I expect we also disagree about how Slow Takeoff would look, but I don't think that's the cruxy bit for me here). 

I'm sort of hesitant

... (read more)
2Rohin Shah1yAh, I see. I agree with this and do think it cuts against my point #1, but not points #2 and #3. Edited the top-level comment to note this. Tbc, I find it quite likely that there was mass institutional failure with COVID; I'm mostly arguing that soft takeoff is sufficiently different from COVID that we shouldn't necessarily expect the same mass institutional failure in the case of soft takeoff. (This is similar to Matthew's argument [] that the pandemic shares more properties with fast takeoff than with slow takeoff.)
Possible takeaways from the coronavirus pandemic for slow AI takeoff

1. Many new problems arose during this pandemic for which we did not have historical experience, e.g. in supply chains. (Perhaps we had historical precedent in the Spanish flu, but that was sufficiently long ago that I don’t expect those lessons to generalize, or for us to remember those lessons.) In contrast, I expect that with AI alignment the problems will not change much as the AI systems become more powerful. Certainly the effects of misaligned powerful AI systems will change dramatically and be harder to mitigate, but I expect the underlying causes o

... (read more)
2Rohin Shah1yRelative to our position now, there will be more novel problems from powerful AI systems than for COVID. Relative to our position e.g. two years before the "point of no return" (perhaps the deployment of the AI system that will eventually lead to extinction), there will be fewer novel problems than for COVID, at least if we are talking about the underlying causes of misalignment. (The difference is that with AI alignment we're trying to prevent misaligned powerful AI systems from being deployed, whereas with pandemics we don't have the option of preventing "powerful diseases" from arising; we instead have to mitigate their effects.) I agree that powerful AI systems will lead to more novel problems in their effects on society than COVID did, but that's mostly irrelevant if your goal is to make sure you don't have a superintelligent AI system that is trying to hurt you. I think it is plausible that we "could have" completely suppressed COVID, and that mostly wouldn't have required facts we didn't know, and the fact that we didn't do that is at least a weak sign of inadequacy. I think given that we didn't suppress COVID, mitigating its damage probably involved new problems that we didn't have solutions for before. As an example, I would guess that in past epidemics the solution to "we have a mask shortage" would have been "buy masks from <country without the epidemic>", but that no longer works for COVID. But really the intuition is more like "life is very different in this pandemic relative to previous epidemics; it would be shocking if this didn't make the problem harder in some way that we failed to foresee".
AGIs as collectives

(serious question, I'm not sure what the right process here is)

What do you think should happen instead of "read through and object to Wei_Dai's existing blogposts?". Is there a different process that would work better? Or you think this generally isn't worth the time? Or you think Wei Dai should write a blogpost that more clearly passes your "sniff test" of "probably compelling enough to be worth more of my attention?"

Mostly "Wei Dai should write a blogpost that more clearly passes your "sniff test" of "probably compelling enough to be worth more of my attention"". And ideally a whole sequence or a paper.

It's possible that Wei has already done this, and that I just haven't noticed. But I had a quick look at a few of the blog posts linked in the "Disjunctive scenarios" post, and they seem to overall be pretty short and non-concrete, even for blog posts. Also, there are literally thirty items on the list, which makes it ha... (read more)

Defining Myopia

(note, this comment is kinda grumpy but, to be clear, comes from the context of me generally quite respecting you as a writer. :P)

I can't remember if I've complained about this elsewhere, but I have no idea what you mean by myopia, and I was about to comment (on another post) asking if you could write a post that succinctly defined what you meant by myopia (or if the point is that it's hard to define, say that explicitly and give a few short attempted descriptions that could help me triangulate it).

Then I searched to see if you'd already done that, and fou

... (read more)

Sorry for somehow missing/ignoring this comment for about 5 months. The short answer is that I've been treating "myopia" as a focusing object, and am likely to think any definitions (including my own definitions in the OP) are too hasty and don't capture everything I want to point at. In fact I initially tried to use the new term "partial agency" to make sure people didn't think I was talking about more well-defined versions.

My attempt to give others a handle for the same focusing object was in the first post of the seque... (read more)

Demons in Imperfect Search

Pedagogical note: something that feels like it's missing from the fable is a "realistic" sense of how demons get created and how they can manipulate the hill. 

Fortunately your subsequent real-world examples all have this, and, like, I did know what you meant. But it felt sort of arbitrary to have this combo of "Well, there's a very concrete, visceral example of the ball rolling downhill – I know what that means. But then there are some entities that can arbitrarily shape the hill. Why are the demons weak at the beginning and stronger the more you fold

... (read more)
3johnswentworth2yUpdated the long paragraph in the fable a bit, hopefully that will help somewhat. It's hard to make it really concrete when I don't have a good mathematical description of how these things pop up; I'm not sure which aspects of the environment make it happen, so I don't know what to emphasize.
Realism about rationality

Which you could round off to "biologists don't need to know about evolution", in the sense that it is not the best use of their time.

The most obvious thing is understanding why overuse of antibiotics might weaken the effect of antibiotics.

2Rohin Shah2ySee response to Daniel below; I find this one a little compelling (but not that much).
Realism about rationality

I guess the main thing I want is an actual tally on "how many people definitively found this post to represent their crux", vs "how many people think that this represented other people's cruxes"

3Rohin Shah2yIf I believed realism about rationality, I'd be closer to buying what I see as the MIRI story for impact [] . It's hard to say whether I'd actually change my mind without knowing the details of what exactly I'm updating to.
Realism about rationality

Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I'm not sure he qualifies as a "miri person")

5DanielFilan2y* I believe in some form of rationality realism: that is, that there's a neat mathematical theory of ideal rationality that's in practice relevant for how to build rational agents and be rational. I expect there to be a theory of bounded rationality about as mathematically specifiable and neat as electromagnetism (which after all in the real world requires a bunch of materials science to tell you about the permittivity of things). * If I didn't believe the above, I'd be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my 'worldview' related to AI. * Searching for beliefs I hold for which 'rationality realism' is crucial by imagining what I'd conclude if I learned that 'rationality irrealism' was more right: * I'd be more interested in empirical understanding of deep learning and less interested in an understanding of learning theory. * I'd be less interested in probabilistic forecasting of things. * I'd want to find some higher-level thing that was more 'real'/mathematically characterisable, and study that instead. * I'd be less optimistic about the prospects for an 'ideal' decision and reasoning theory. * My research depends on the belief that rational agents in the real world are likely to have some kind of ordered internal structure that is comprehensible to people. This belief is informed by rationality realism but distinct from it.
1Raymond Arnold2yI guess the main thing I want is an actual tally on "how many people definitively found this post to represent their crux", vs "how many people think that this represented other people's cruxes"
2DanielFilan2yTo answer the easy part of this question/remark, I don't work at MIRI and don't research agent foundations, so I think I shouldn't count as a "MIRI person", despite having good friends at MIRI and having interned there. (On a related note, it seems to me that the terminology "MIRI person"/"MIRI cluster" obscures intellectual positions and highlights social connections, which makes me wish that it was less prominent.)
The Rocket Alignment Problem

I just wanted to flag that this post hasn't been reviewed yet, despite being one of the most nominated posts. (And mosts of the nominations here are quite short).

The most obvious sort of review that'd be good to see is from people who were in this post's target demographic (i.e. people who hadn't understood or been unpersuaded about what sort of problem MIRI is trying to solve), about whether this post actually helped them understand that. 

I'd also be interested in reviews that grapple a bit more with "how well exactly does this metaphor hold up?", al

... (read more)
Robustness to Scale

At the time I began writing this previous comment, I felt like I hadn't directly gotten that much use of this post. But then after reflecting a bit about Beyond Astronomical Waste I realized this had actually been a fairly important concept in some of my other thinking.

Beyond Astronomical Waste

I think that at the time this post came out, I didn't have the mental scaffolding necessary to really engage with it – I thought of this question as maybe important, but sort of "above my paygrade", something better left to other people who would have the resources to engage more seriously with it.

But, over the past couple years, the concepts here have formed an important component of my understanding of robust agency. Much of this came from private in-person conversations, but this post is the best writeup of the concept I'm cur... (read more)

Robustness to Scale

(5 upvotes from a few AF users suggests this post probably should be nominated by an additional AF person, but unsure. I do apologize again for not having better nomination-endorsement-UI.

I think this post may have been relevant to my own thinking, but I'm particularly interested in how relevant the concept has been to other people who think professionally about alignment)

3Buck Shlegeris2yI think that the terms introduced by this post are great and I use them all the time
Bottle Caps Aren't Optimisers

A reminder, since this looks like it has a few upvotes from AF users: posts need 2 nominations to proceed to the review round. 

Chris Olah’s views on AGI safety

I'm not sure I understand the difference between this worldview and my own. (The phrase-in-italics in your comment seemed fairly integral to how I was thinking about alignment/capabilities in the first place).

This recent comment of yours seems more relevant as far as worldview differences go, i.e. 'if you expect discontinuous takeoff, then transparency is unlikely to do what you want'. (some slightly more vague "what counts as a clever argument" disagreement might be relevant too, although I'm not sure I can state my worry cr... (read more)

5Rohin Shah2yFwiw, under the worldview I'm outlining, this sounds like a "clever argument" to me, that I would expect on priors to be less likely to be true, regardless of my position on takeoff. (Takeoff does matter, in that I expect that this worldview is not very accurate/good if there's discontinuous takeoff, but imputing the worldview I don't think takeoff matters.) I often think of this as penalizing nth-order effects in proportion to some quickly-growing function of n. (Warning: I'm using the phrase "nth-order effects" in a non-standard, non-technical way.) Under the worldview I mentioned, the first-order effect of better understanding of AI systems, is that you are more likely to build AI systems that are useful and do what you want. The second-order effect is "maybe there's a regime where you can build capable-but-not-safe things; if we're currently below that, it's bad to go up into that regime". This requires a more complicated model of the world (given this worldview) and more assumptions of where we are. (Also, now that I've written this out, the model also predicts there's no chance of solving alignment, because we'll first reach the capable-but-not-safe things, and die. Probably the best thing to do on this model is to race ahead on understanding as fast as possible, and hope we leapfrog directly to the capable-and-safe regime? Or you work on understanding AI in secret, and only release once you know how to do capable-and-safe, so that no one has the chance to work on capable-but-not-safe? You can see why this argument feels a bit off under the worldview I outlined.)
Chris Olah’s views on AGI safety

Sort of a side point, but something that's been helpful to me in this post and others in the past year is reconceptualizing the Fast/Slow takeoff into "Continuous" vs "Hard" takeoff, which suggest different strategic considerations. This particular post helped flesh out some of my models of what considerations are at play.

Is it a correct summary of the final point: "either this doesn't really impact the field, so it doesn't increase capabilities; or, it successfully moves the ML field from 'everything is opaque ... (read more)

5Rohin Shah2yI think it shouldn't be in the "clever argument" category, and the only reason it feels like that is because you're using the capabilities-alignment framework. Consider instead this worldview: The way you build things that are useful and do what you want is to understand how things work and put them together in a deliberate way. If you put things together randomly, they either won't work, or will have unintended side effects. (This worldview can apply to far more than AI; e.g. it seems right in basically every STEM field. You might argue that putting things together randomly seems to work surprisingly well in AI, to which I say that it really doesn't, you just don't see all of the effort where you put things together randomly and it simply flat-out fails.) The argument "it's good for people to understand AI techniques better even if it accelerates AGI" is a very straightforward non-clever consequence of this worldview. Somewhat more broadly, I recommend being able to inhabit this other worldview. I expect it to be more useful / accurate than the capabilities / alignment worldview. (Disclaimer: I believed this point before this post -- in fact I had several conversations with people about it back in May, when I was considering a project with potential effects along these lines.)

Yep, I think that's a correct summary of the final point.

The main counterpoint that comes to mind is a possible world where "opaque AIs" just can't ever achieve general intelligence, but moderately well-thought-out AI designs can bridge the gap to "general intelligence/agency" without being reliable enough to be aligned.

Well, we know it's possible to achieve general intelligence via dumb black box search—evolution did it—and we've got lots of evidence for current black box approaches being quite powerful. So it seems unlikely to me that we "just can't

... (read more)
AI Alignment Open Thread October 2019

I'm not sure what'll end up settling with for "regular Open Threads" vs shortform. Open Threads predate shortform, but didn't create the particular feeling of a person-space like shortform does, so it seemed useful to add shortform. I'm not sure if Open Threads still provide a particular service that shortform doesn't provide.

In _this_ case, however, I think the Alignment Open Thread servers a bit of a different purpose – it's a place to spark low-key conversation between AF members. (Non-AF members can c... (read more)

2Vanessa Kosoy2yActually, now I'm confused. I just posted a shortform, but I don't see where it appears on the main page? There is "AI Alignment Posts" which only includes the "longforms" and there is "recent discussion" which only includes the comments. Does it mean nobody sees the shortform unless they open my profile?
1Vanessa Kosoy2yHmm, this seems like an informal cultural difference that isn't really enforced by the format. Technically, people can comment on the shortform as easily as on open thread comments. So, I am not entirely sure whether everyone perceive it this way (and will continue to perceive it this way).

Planning of vengeance continues apace, either way.

Load More