As part of a project I am doing at work, I took a look around to find recent overview / survey / literature review papers on several topics related to AI safety, security, and assurance. The focus was primarily on nearer-term issues, but with an eye towards longer-term risks as well. Several of the papers I found will be familiar to many people here, but others might not be, so I am sharing this list in case it is helpful to anybody else.

Note that this is not intended at all to be a comprehensive list even for the topics covered. I tried to find between 1 and 4 overview or survey papers for each topic, but no more than that - I and/or someone on my team will need to actually read through all these, after all, so I didn't want too many papers. I did not include blog posts or book-length sources, such as Russell's Human Compatible or Christian's The Alignment Problem. I also mostly did not include papers from earlier than a year or two ago, which is why it might look a bit odd that I haven't included some of the more well-known earlier papers.

The specific topics in this list are related to the project I'm doing at work, so I only focused on trying to find papers related to these topics. However, I am also very interested in recent survey / lit review papers on other related topics, so if you know of any good ones please link to them in the comments. And of course, if you know of other good survey / lit review papers on these specific topics, please let me know about those.

New Comment