Sequences

Pragmatic AI Safety

Wiki Contributions

Comments

"AI Safety" which often in practice means "self driving cars"

This may have been true four years ago, but ML researchers at leading labs rarely directly work on self-driving cars (e.g., research on sensor fusion). AV is has not been hot in quite a while. Fortunately now that AGI-like chatbots are popular, we're moving out of the realm of talking about making very narrow systems safer. The association with AV was not that bad since it was about getting many nines of reliability/extreme reliability, which was a useful subgoal. Unfortunately the world has not been able to make a DL model completely reliable in any specific domain (even MNIST).

Of course, they weren't talking about x-risks, but neither are industry researchers using the word "alignment" today to mean they're fine-tuning a model to be more knowledgable or making models better satisfy capabilities wants (sometimes dressed up as "human values").

If you want a word that reliably denotes catastrophic risks that is also mainstream, you'll need to make catastrophic risk ideas mainstream. Expect it to be watered down for some time, or expect it not to go mainstream.

When ML models get more competent, ML capabilities researchers will have strong incentives to build superhuman models. Finding superhuman training techniques would be the main thing they'd work on. Consequently, when the problem is more tractable, I don't see why it'd be neglected by the capabilities community--it'd be unreasonable for profit maximizers not to have it as a top priority when it becomes tractable. I don't see why alignment researchers have to work in this area with high externalities now and ignore other safe alignment research areas (in practice, the alignment teams with compute are mostly just working on this area). I'd be in favor of figuring out how to get superhuman supervision for specific things related to normative factors/human values (e.g., superhuman wellbeing supervision), but researching superhuman supervision simpliciter will be the aim of the capabilities community.

Don't worry, the capabilities community will relentlessly maximize vanilla accuracy, and we don't need to help them.

I am strongly in favor of our very best content going on arXiv. Both communities should engage more with each other.

As follows are suggestions for posting to arXiv. As a rule of thumb, if the content of a blogpost didn't take >300 hours of labor to create, then it probably should not go on arXiv. Maintaining a basic quality bar prevents arXiv from being overriden by people who like writing up many of their inchoate thoughts; publication standards are different for LW/AF than for arXiv. Even if a researcher spent many hours on the project, arXiv moderators do not want research that's below a certain bar. arXiv moderators have reminded some professors that they will likely reject papers at the quality level of a Stanford undergraduate team project (e.g., http://cs231n.stanford.edu/2017/reports.html); consequently labor, topicality, and conforming to formatting standards is not sufficient for arXiv approval. Usually one's first research project won't be good enough for arXiv. Furthermore, conceptual/philosophical pieces probably should be primarily posted on arXiv's .CY section. For more technical deep learning content, do not make the mistake of only putting it on .AI; these should probably go on .LG (machine learning) or .CV (computer vision) or .CL (NLP). arXiv's .ML section is for more statistical/theoretical machine learning audiences. For content to be approved without complications, it should likely conform to standard (ICLR, ICML, NeurIPS, CVPR, ECCV, ICCV,  ACL, EMNLP) formatting. This means automatic blogpost exporting is likely not viable. In trying to diffuse ideas to the broader ML community, we should avoid making the arXiv moderators mad at us.

Here's a continual stream of related arXiv papers available through reddit and twitter.

https://www.reddit.com/r/mlsafety/

https://twitter.com/topofmlsafety

I should say formatting is likely a large contributing factor for this outcome. Tom Dietterich, an arXiv moderator, apparently had a positive impression of the content of your grokking analysis. However, research on arXiv will be more likely to go live if it conforms to standard (ICLR, NeurIPS, ICML) formatting and isn't a blogpost automatically exported into a TeX file.