Some recent survey papers on (mostly near-term) AI safety, security, and assurance

As part of a project I am doing at work, I took a look around to find recent overview / survey / literature review papers on several topics related to AI safety, security, and assurance. The focus was primarily on nearer-term issues, but with an eye towards longer-term risks as well. Several of the papers I found will be familiar to many people here, but others might not be, so I am sharing this list in case it is helpful to anybody else.

Note that this is not intended at all to be a comprehensive list even for the topics covered. I tried to find between 1 and 4 overview or survey papers for each topic, but no more than that - I and/or someone on my team will need to actually read through all these, after all, so I didn't want too many papers. I did not include blog posts or book-length sources, such as Russell's Human Compatible or Christian's The Alignment Problem. I also mostly did not include papers from earlier than a year or two ago, which is why it might look a bit odd that I haven't included some of the more well-known earlier papers.

The specific topics in this list are related to the project I'm doing at work, so I only focused on trying to find papers related to these topics. However, I am also very interested in recent survey / lit review papers on other related topics, so if you know of any good ones please link to them in the comments. And of course, if you know of other good survey / lit review papers on these specific topics, please let me know about those.

General
- Toreini, et al (July 2020), Technologies for Trustworthy Machine Learning: A Survey in a Socio-Technical Context (mainly focused on ethics but also including safety & security)
- Huang, et al (2018), A Survey of Safety and Trustworthiness of Deep Neural Networks: Verification, Testing, Adversarial Attack and Defence, and Interpretability
- Osoba, et al (RAND, Feb 2020), Steps Towards Value-Aligned Systems
- On ML in smart cities: Ahmad, et al (Dec 2020), Developing Future Human-Centered Smart Cities: Critical Analysis of Smart City Security, Interpretability, and Ethical Challenges
Side effects
- Saisubramanian, et al (Aug 2020), Avoiding Negative Side Effects due to Incomplete Knowledge of AI Systems
Out of distribution (OOD) robustness
- Koh, et al (Dec 2020), WILDS: A Benchmark of in-the-Wild Distribution Shifts (benchmark paper, but with several sections that provide an overview)
- Related Work section (section 4) of Subbaswamy, et al (JHU, Oct 2020), Evaluating Model Robustness to Dataset Shift
Interpretability
- Appendix VI(C) of Brundage, et al (April 2020), Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
- Dao & Lee (Dec 2020), Demystifying Deep Neural Networks Through Interpretation: A Survey
- Molnar, et al (Oct 2020), Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges
AI security
- Andrew John (CSET, Dec 2020), Hacking AI
- Goldblum, et al (Dec 2020), Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
- McGraw, et al (Jan 2020), An Architectural Risk Analysis of Machine Learning Systems: Toward More Secure Machine Learning
Human-Machine Interactions
- Dafoe, et al (DeepMind, Dec 2020), Open Problems in Cooperative AI
- Calinescu, et al (AAIP, May 2019), Socio-Cyber-Physical Systems: Models, Opportunities, Open Challenges
- Sujan, et al (Nov 2019), Human factors challenges for the safe use of artificial intelligence in patient care
Systems Engineering
- Görkem Giray (Dec 2020), A Software Engineering Perspective on Engineering Machine Learning Systems: State of the Art and Challenges
- Assuring Autonomy International Programme (AAIP) Body of Knowledge (also look at other papers linked on their website)
- Harel, et al (Nov 2019), Autonomics: In Search of a Foundation for Next Generation Autonomous Systems
Testing, Evaluation, Verification, and Validation (TEV&V)
- Flournoy, et al (CSET, Oct 2020), Building Trust through Testing
- Appendix VI(A) of Brundage, et al (April 2020), Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims (formal verification)
- Corso, et al (May 2020), A Survey of Algorithms for Black-Box Safety Validation
- Borg, et al (Dec 2018), Safely Entering the Deep: A Review of Verification and Validation for Machine Learning and a Challenge Elicitation in the Automotive Industry
Governance
- Brundage, et al (April 2020), Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims (long)
- ÓhÉigeartaigh, et al (May 2020), Overcoming Barriers to Cross-cultural Cooperation in AI Ethics and Governance
AGI / TAI
- Evan Hubinger (MIRI, Dec 2020), An overview of 11 proposals for building safe advanced AI
- Critch and Krueger (CHAI, May 2020), AI Research Considerations for Human Existential Safety (ARCHES) (focus on governance)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

7

Some recent survey papers on (mostly near-term) AI safety, security, and assurance

7