This work has been done in the context of SaferAI’s work on risk assessment. Equal contribution by Eli and Joel. I'm sharing this writeup in the form of a Google Doc and reproducing the summary below. Disclaimer: this writeup is context for upcoming experiments, not complete work. As such it...
The programme thesis of Davidad's agenda to develop provably safe AI has just been published. For context, Davidad is a Programme Director at ARIA who will grant somewhere between £10M and £50M over the next 3 years to pursue his research agenda. It is the most comprehensive public document detailing...
Summary TLDR Responsible Scaling Policies (RSPs) have been recently proposed as a way to keep scaling frontier large language models safely. While being a nice attempt at committing to specific practices, the framework of RSP is: 1. missing core components of basic risk management procedures (Section 2 & 3) 2....
This post was written by Fabien at SaferAI[1]. Simeon has prompted Fabien in relevant directions and has provided valuable feedback. Thanks to Jean-Stanislas Denain, Alexandre Variengien, Charbel-Raphael Segerie, and Nicole Nohemi for providing helpful feedback on early experiments and drafts of this post. In this post * I describe a...