David Krueger

David Krueger's Comments

What I talk about when I talk about AI x-risk: 3 core claims I want machine learning researchers to address.

I'm definitely interested in hearing other ways of splitting it up! This is one of the points of making this post. I'm also interested in what you think of the ways I've done the breakdown! Since you proposed an alternative, I guess you might have some thoughts on why it could be better :)

I see your points as being directed more at increasing ML researchers respect for AI x-risk work and their likelihood of doing relevant work. Maybe that should in fact be the goal. It seems to be a more common goal.

I would describe my goal (with this post, at least, and probably with most conversations I have with ML people about Xrisk) as something more like: "get them to understand the AI safety mindset, and where I'm coming from; get them to really think about the problem and engage with it". I expect a lot of people here would reason in a very narrow and myopic consequentialist way that this is not as good a goal, but I'm unconvinced.

A list of good heuristics that the case for AI x-risk fails

Another important improvement I should make: rephrase these to have the type signature of "heuristic"!

A list of good heuristics that the case for AI x-risk fails

Oh sure, in some special cases. I don't this this experience was particularly representative.

A list of good heuristics that the case for AI x-risk fails

Yeah I've had conversations with people who shot down a long list of concerned experts, e.g.:

  • Stuart Russell is GOFAI ==> out-of-touch
  • Shane Legg doesn't do DL, does he even do research? ==> out-of-touch
  • Ilya Sutskever (and everyone at OpenAI) is crazy, they think AGI is 5 years away ==> out-of-touch
  • Anyone at DeepMind is just marketing their B.S. "AGI" story or drank the koolaid ==> out-of-touch

But then, even the big 5 of deep learning have all said things that can be used to support the case....

So it kind of seems like there should be a compendium of quotes somewhere, or something.

Clarifying some key hypotheses in AI alignment

Nice chart!

A few questions and comments:

  • Why the arrow from "agentive AI" to "humans are economically outcompeted"? The explanation makes it sounds like it should point to "target loading fails"??
  • Suggestion: make the blue boxes without parents more apparent? e.g. a different shade of blue? Or all sitting above the other ones? (e.g. "broad basin of corrigibility" could be moved up and left).
A list of good heuristics that the case for AI x-risk fails

I pushed this post out since I think it's good to link to it in this other post. But there are at least 2 improvements I'd like to make and would appreciate help with:

AI Safety "Success Stories"

Does an "AI safety success story" encapsulate just a certain trajectory in AI (safety) development?

Or does it also include a story about how AI is deployed (and by who, etc.)?

I like this post a lot, but I think it ends up being a bit unclear because I don't think everyone has the same use cases in mind for the different technologies underlying these scenarios, and/or I don't think everyone agrees with the way in which safety research is viewed as contributing to success in these different scenarios... Maybe fleshing out the success stories, or referencing some more in-depth elaborations of them would make this clearer?

AI Safety "Success Stories"

I'm going to dispute a few cells in your grid.

  • I think pivotal tool story has low reliance on human safety (although I'm confused by that row in general).
  • Whether sovereigns would require restricted access is unclear. This is basically the question of whether single-agent, single-user alignment will likely produce a solution to multi-agent, multi-user alignment (in a timely manner).
  • ETA: the "interim quality of life improver" seems to roughly be talking about episodic RL, which I would classify as "medium" autonomy.
AI Safety "Success Stories"

I don't understand what you mean by "Reliance on human safety". Can you clarify/elaborate? Is this like... relying on humans' (meta-)philosophical competence? Relying on not having bad actors? etc...


AI Safety "Success Stories"

While that's true to some extent, a lot of research does seem to be motivated much more by some of these scenarios. For example, work on safe oracle designs seems primarily motivated by the pivotal tool success story.

Load More