I think there needs to be individual decisionmaking (on the part of both organizations and individual researchers, especially in light of the unilateralists' curse,) alongside a much broader discussion about how the world should handle unsafe machine learning, and more advanced AI.
I very much don't think that the AI safety community debating and coming up with shared, semi-public guidelines for, essentially, what to withhold from the broader public, done without input from the wider ML / AI and research community who are impacted and whose work is a big part of what we are discussing, would be wise. That community needs to be engaged in any such discussions.
There's some intermediate options available instead of just "full secret" or "full publish"... and I haven't seen anyone mention that...
OpenAI's phased release of GPT2 seems like a clear example of exactly this. And there is a forthcoming paper looking at the internal deliberations around this from Toby Shevlane, in addition to his extant work on the question of how disclosure potentially affects misuse.
The first thing I would note is that stakeholders need to be involved in making any guidelines, and that pushing for guidelines from the outside is unhelpful, if not harmful, since it pushes participants to be defensive about their work. There are also an extensive literature discussing the general issue of information dissemination hazards and the issues of regulation in other domains, such as nuclear weapons technology, biological and chemical weapons, and similar.
There is also a fair amount of ongoing work on synthesizing this literature and the implications for AI. Some of it is even on this site. For example, see: https://www.lesswrong.com/posts/RY9XYoqPeMc8W8zbH/mapping-downside-risks-and-information-hazards and https://www.lesswrong.com/posts/6ur8vDX6ApAXrRN3t/information-hazards-why-you-should-care-and-what-you-can-do
So there is tons of discussion about this already, and there is plenty you should read on the topic - I suspect you can start with the paper that provided the name for your post, and continuing with sections of GovAI's research agenda.
Oh. Right. I should have gotten the reference, but wasn't thinking about it.
I'd focus even more, (per my comment to Vanniver's response,) and ask "What parts of OpenAI are most and least valuable, and how do these relate to their strategy - and what strategy is best?"
I would reemphasize that the "does OpenAI increase risks" is a counterfactual question. That means we need to be clearer about what we are asking as a matter of predicting what the counterfactuals are, and consider strategy options for going forward. This is a major set of questions, and increasing or decreasing risks as a single metric isn't enough to capture much of interest.
For a taste of what we'd want to consider, what about the following:
Are we asking OpenAI to pick a different, "safer" strategy?
Perhaps they should focus more on hiring people to work on safety and strategy, and hire fewer capabilities researchers. That brings us to the Dr. Wily/Dr. Light question - Perhaps Dr. Capabilities B. Wily shouldn't be hired, and Dr. Safety R. Light should be, instead. That means Wily does capabilities research elsewhere, perhaps with more resources, and Light does safety research at OpenAI. But the counterfactual is that Light would do (perhaps slightly less well funded) research on safety anyways, and Wily would work on (approximately as useful) capabilities research at OpenAI - advantaging OpenAI in any capabilities races in the future.
Are we asking OpenAI to be larger, and (if needed,) we should find them funding?
Perhaps the should hire both, along with all of Dr. Light and Dr. Wily's research teams. Fast growth will dilute OpenAI's culture, but give them an additional marginal advantage over other groups. Perhaps bringing them in would help OpenAI in race dynamics, but make it more likely that they'd engage in such races.
How much funding would this need? Perhaps none - they have cash, they just need to do this. Or perhaps tons, and we need them to be profitable, and focus on that strategy, with all of the implications of that. Or perhaps a moderate amount, and we just need OpenPhil to give them another billion dollars, and then we need to ask about the counterfactual impact of that money.
Or OpenAI should focus on redirecting their capabilities staff to work on safety, and have a harder time hiring the best people who want to work on capabilities? Or OpenAI should be smaller and more focused, and reserve cash?
These are all important questions, but need much more time than I, or I suspect, most of the readers here have available - and are probably already being discussed more usefully by both OpenAI, and their advisors.
Now the perhaps harder step is trying to get traction on them
Yes, very much so. We're working on a few parts of this now, as part of a different project, but I agree that it's tricky. And there are a number of other things that seem like potentially very useful projects if others are interested in collaborations, or just some ideas / suggestions about how they could be approached.
(On the tables, unfortunately the tables were pasted in as images from another program. We should definitely see if we can get higher-resolution, even if we can't convert to text easily.)
I'm unsure that GPT3 can output, say, a ipython notebook to get the values it wants.
That would be really interesting to try...
(I really like this post, as I said to Issa elsewhere, but) I realized after discussing this earlier that I don't agree with a key part of the precise vs. imprecise model distinction.
A precise theory is one which can scale to 2+ levels of abstraction/indirection.
An imprecise theory is one which can scale to at most 1 level of abstraction/indirection.
I think this is wrong. More levels of abstraction are worse, not better. Specifically, if a model exactly describes a system on one level, any abstraction will lose predictive power. (Ignoring computational cost - which I'll get back to,) Quantum theory is more specifically predictive than Newtonian physics. The reason that we can move up and down levels is because we understand the system well enough to quantify how much precision we are losing, not because we can move further without losing precision.
The reason that precise theories are better is because they are tractable enough to quantify how far we can move away from them, and how much we lose by doing so. The problem with economics isn't that we don't have accurate enough models of human behavior to aggregate them, but that the inaccuracy isn't precise enough to allow understanding how the uncertainty from psychology shows up in economics. Fore example, behavioral economics is partly useless because we can't build equilibrium models - and the reason is because we can't quantify how they are wrong. For economics, we're better off with the worse model of rational agents, which we know is wrong, but can kind-of start to quantify by how much, so we can do economic analyses.
I think this is covered in my view of optimization via selection, where "direct solution" is the third option. Any one-shot optimizer is implicitly relying on an internal model completely for decision making, rather than iterating, as I explain there. I think that is compatible with the model here, but it needs to be extended slightly to cover what I was trying to say there.