14 Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

by David Scott Krueger (formerly: capybaralet)

6th Aug 2019

2 min read

11

14

AI

Frontpage

Mentioned in

14[AN #62] Are adversarial examples caused by real but imperceptible features?

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:36 PM

[-]abramdemski6y70

I am a bit surprised to see you begin this post by saying there seems to be a consensus that people shouldn't worry about capabilities consequences of their work, but then, I come from the miri-influenced crowd. I agree that it would be good to have a lot more clarity on how to think about this.

I agree it could be somewhat good for miri to have a hit ml publication, particularly if it was something unlikely to shift progress significantly. I could imagine a universe where this happened if miri happened upon a very interesting safety-advanced thing, the way adversarial counterexamples were this big new thing slightly outside the usual ml way of doing business (ie, not achieving high scores on a task with some improved technique). But it seems fairly unlikely to be worth it to try to play the usual ml game at the level of top ml groups simply for the sake of prestige, because it is likely too hard to gain prestige that way with so many others trying. It seems better in spirit to gain credibility by doing what miri does best and getting recognition for what's good (of the open research). O suspect we have some deep disagreements about background models.

I think the best way to reach ml people in the long run is not through credibility, but through good arguments presented well. Let me clarify: credibility/prestige definitely play a huge role in what the bulk of people think. But the credibility system is good enough that the top credible people are really pretty smart, so to an extent can be swayed by good arguments presented well. This case can definitely be overstated and I feel like I'm presenting a picture which will right be criticised as over-optimistic. But I think there are some success stories, and it's the honest leverage path (in contrast to fighting for prestige in a system in which lots of people are similarly doing so).

Anyway, I've hardly said anything about your main point. I don't know how to think about it, and I wish I did. I usually try to think about differential progress and then fail, and fall back on an assessment of how surprised I'd be if something lead to big AI progress, and am cautious if it seems within the realm of possibility.

Reply

[-]David Scott Krueger (formerly: capybaralet)6y40

I do think this is an overly optimistic picture. The amount of traction an argument gets seems to be something like a product of how good the argument is, how credible those making the argument are, and how easy it is to process the argument.

Also, regarding this:

But the credibility system is good enough that the top credible people are really pretty smart, so to an extent can be swayed by good arguments presented well.

It's not just intelligence that determines if people will be swayed; I think other factors (like "rationality", "open-mindedness", and other personality factors play a very big role.

Reply

Moderation Log

Curated and popular this week

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

14

Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

14