I think forcing people to publicly endorse policies
Saying "If this happened, it would solve the problem" is not to endorse a policy. (though perhaps literally shouting from the rooftops might be less than ideal)
It's entirely possible to state both "If x happened, it'd solve the problem", and "The policy we think is most likely to be effective in practice is Y". They can be put in the same statement quite simply.
It's reasonable to say that this might not be the most effective communication strategy. (though I think on balance I'd disagree)
It's not reasonable to say that this amounts to publicly endorsing a policy.
...if we ended capitalism, it would solve climate change. Though true...
This seems an unhelpful parallel, first because it's not clearly true. (In particular "ended capitalism" isn't "ended capitalism, and replaced it with communism", nor "ended capitalism overnight without a plan to replace it").
Second, because the proposal in this case is to not actively enact a radically disruptive change to society.
The logic of the point you're making is reasonable, but the parallel has a bunch of baggage that reduces overall clarity IMO.
...because there are much more effective ways of addressing climate change than starting a communist revolution...
This part isn't even a parallel: here even if successful the communist revolution wouldn't be most effective. However if successful a sufficiently good pause would be most effective.
The specific conversation is much better than nothing - but I do think it ought to be emphasized that solving all the problems we're aware of isn't sufficient for safety. We're training on the test set.[1]
Our confidence levels should reflect that - but I expect overconfidence.
It's plausible that RSPs could be net positive, but I think that given successful coordination [vague and uncertain] beats [significantly more concrete, but overconfident].
My presumption is that without good coordination (a necessary condition being cautious decision-makers), things will go badly.
RSPs seem likely to increase the odds we get some international coordination and regulation. But to get sufficient regulation, we'd need the unknown unknowns issue to be covered at some point. To me this seems simplest to add clearly and explicitly from the beginning. Otherwise I expect regulation to adapt to issues for which we have concrete new evidence, and to fail to adapt beyond that.
Granted that you're not the median voter/decision-maker - but you're certainly one of the most, if not the most, influential voice on the issue. It seems important not to underestimate your capacity to change people's views before figuring out a compromise to aim for (I'm primarily thinking of government people, who seem more likely to have views that might change radically based on a few conversations). But I'm certainly no expert on this kind of thing.
I do wonder whether it might be helpful not to share all known problems publicly on this basis - I'd have somewhat more confidence in safety measures that succeeded in solving some problems of a type the designers didn't know about.
Thanks for writing this.
I'd be interested in your view on the comments made on Evan's RSP post w.r.t unknown unknowns. I think aysja put it best in this comment. It seems important to move the burden of proof.
Would you consider "an unknown unknown causes a catastrophe" to be a "concrete way in which they fail to manage risk"? Concrete or not, this seems sufficient grounds to stop, unless there's a clear argument that a bit more scaling actually helps for safety. (I'd be interested in your take on that - e.g. on what speed boost you might expect with your own research, given AI assistants of some level)
By default, I don't expect the "affirmative case for safety that will require novel science" to be sufficient if it ends up looking like "We developed state of the art tools that address all known problems, and we don't expect others".
On the name, it's not 'responsible' that bothers me, but rather 'scaling'.
"Responsible x-ing policy" gives the strong impression that x-ing can be done responsibly, and that x-ing will continue. I'd prefer e.g. "Responsible training and deployment policy". That way scaling isn't a baked in presumption, and we're naming things that we know can be done responsibly.
This is not what most people mean by "for personal gain". (I'm not disputing that Alice gets personal gain)
Insofar as the influence is required for altruistic ends, aiming for it doesn't imply aiming for personal gain.
Insofar as the influence is not required for altruistic ends, we have no basis to believe Alice was aiming for it.
"You're just doing that for personal gain!" is not generally taken to mean that you may be genuinely doing your best to create a better world for everyone, as you see it, in a way that many would broadly endorse.
In this context, an appropriate standard is the post's own:
Does this "predictably lead people to believe false things"?
Yes, it does. (if they believe it)
"Lying for personal gain" is a predictably misleading description, unless much stronger claims are being made about motivation (and I don't think there's sufficient evidence to back those up).
The "lying" part I can mostly go along with. (though based on a contextual 'duty' to speak out when it's unusually important; and I think I'd still want to label the two situations differently: [not speaking out] and [explicitly lying] may both be undesirable, but they're not the same thing)
(I don't really think in terms of duties, but it's a reasonable shorthand here)
Ah okay - thanks. That's clarifying.
Agreed that the post is at the very least not clear.
In particular, it's obviously not true that [if we don't stop today, there's more than a 10% chance we all die], and I don't think [if we never stop, under any circumstances...] is a case many people would be considering at all.
It'd make sense to be much clearer on the 'this' that "many people believe".
(and I hope you're correct on P(doom)!)
[I agree with most of this, and think it's a very useful comment; just pointing out disagreements]
For instance, I think that well implemented RSPs required by a regulatory agency can reduce risk <5% (partially by stopping in worlds where this appears needed).
I assume this would be a crux with Connor/Gabe (and I think I'm at least much less confident in this than you appear to be).
Calling something a "pragmatic middle ground" doesn't imply that there aren't better options
I think the objection here is more about what is loosely suggested by the language used, and what is not said - not about logical implications. What is loosely suggested by the ARC Evals language is that it's not sensible to aim for the more "extreme" end of things (pausing), and that this isn't worthy of argument.
Perhaps ARC Evals have a great argument , but they don't make one. I think it's fair to say that they argue the middle ground is practical. I don't think it can be claimed they argue for pragmatic until they address both the viability of other options, and the risks of various courses. Doing a practical thing that would predictably lead to higher risk is not pragmatic.
It's not clear what the right course here, but making no substantive argument gives a completely incorrect impression. If they didn't think it was the right place for such an argument, then it'd be easy to say that: that this is a complex question, that it's unclear this course is best, and that RSPs vs Pause vs ... deserves a lot more analysis.
The post presupposes a communication/advocacy norm and states violations of this norm should be labeled "lying". I'm not sure I'm sold on this communication norm in the first place.
I'd agree with that, but I do think that in this case it'd be useful for people/orgs to state both a [here's what we'd like ideally] and a [here's what we're currently pushing for]. I can imagine many cases where this wouldn't hold, but I don't see the argument here. If there is an argument, I'd like to hear it! (fine if it's conditional on not being communicated further)
I agree with most of this, but I think the "Let me call this for what it is: lying for personal gain" section is silly and doesn't help your case.
The only sense in which it's clear that it's "for personal gain" is that it's lying to get what you want.
Sure, I'm with you that far - but if what someone wants is [a wonderful future for everyone], then that's hardly what most people would describe as "for personal gain".
By this logic, any instrumental action taken towards an altruistic goal would be "for personal gain".
That's just silly.
It's unhelpful too, since it gives people a somewhat legitimate reason to dismiss the broader point.
Of course it's possible that the longer-term altruistic goal is just a rationalization, and people are after power for its own sake, but I don't buy that this is often true - at least not in any clean [they're doing this and only this] sense. (one could have similar altruistic-goal-is-rationalization suspicions about your actions too)
In many cases, I think overconfidence is sufficient explanation.
And if we get into "Ah, but isn't it interesting that this overconfidence leads to power gain", then I'd agree - but then I claim that you should distinguish [conscious motivations] from [motivations we might infer by looking at the human as a whole, deep shadowy subconscious included]. If you're pointing at the latter, please make that clear. (and we might also ask "What actions are not for personal gain, in this sense?")
Again, entirely with you on the rest.
I'm not against accusations that may hurt feelings - but I think that more precision would be preferable here.
Thanks, this seems very reasonable. I'd missed your other comment.
(Oh and I edited my previous comment for clarity: I guess you were disagreeing with my clumsily misleading wording, rather than what I meant(??))
This is clarifying, thanks.
A few thoughts:
Hmm. Perhaps the thing I'd endorse is more [include this in every detailed statement about policy/regulation], rather than [shout it from the rooftops].
So, for example, if the authors agree with the statement, I think this should be in:
I'm fine if we don't start printing it on bumper stickers.
The outcome I'm interested in is something like: every person with significant influence on policy knows that this is believed to be a good/ideal solution, and that the only reasons against it are based on whether it's achievable in the right form.
If ARC Evals aren't saying this, RSPs don't include it, and many policy proposals don't include it..., then I don't expect this to become common knowledge.
We're much less likely to get a stop if most people with influence don't even realize it's the thing that we'd ideally get.