We're announcing a new course designed to introduce students with a background in machine learning to the most relevant concepts in empirical ML-based AI safety. The course is available publicly here.
AI safety is a small but rapidly growing field, and both younger and more experienced researchers are interested in contributing. However, interest in the field is not enough: researchers are unlikely to make much progress until they understand existing work, which is very difficult if they are simply presented with a list of posts and papers to read. As such, there is a need for curated AI safety curricula that can get new potential researchers up to speed.
Richard Ngo’s AGI Safety Fundamentals filled a huge hole in AI safety education, giving hundreds of people a better understanding of the landscape. In our view, it is the best resource for anyone looking for a conceptual overview of AI safety.
However, until now there has been no course that aims to introduce students to empirical, machine learning-based AI safety research, which we believe is a crucial part of the field. There has also been no course that is designed as a university course usually is, complete with lectures, readings, and assignments; this makes it more likely that it could be taught at a university. Lastly, and perhaps most importantly, most existing resources assume that the reader has higher-than-average openness to AI x-risk. If we are to onboard more machine learning researchers, this should not to be taken for granted.
In this post, we present a new, publicly-available course that Dan Hendrycks has been working on for the last eight months: Introduction to ML Safety. The course is a project of the Center for AI Safety.
The purpose of Introduction to ML Safety is to introduce people familiar with machine learning and deep learning to the latest directions in empirical ML safety research and explain existential risk considerations. Our hope is that the course can serve as the default for ML researchers interested in doing work relevant to AI safety, as well as undergraduates who are interested in beginning research in empirical ML safety. The course could also potentially be taught at universities by faculty interested in teaching it.
The course contains research areas that many reasonable people concerned about AI x-risk think are valuable, though we exclude those that don’t (yet) have an empirical ML component, as they aren’t really in scope for the course. Most of the areas in the course are also covered in Open Problems in AI X-Risk.
The course is still very much in beta, and we will make improvements over the coming year. Part of the improvements will be based on feedback from students in the ML Safety Scholars summer program.
The course is divided into seven sections, covered below. Each section has lectures, readings, assignments, and (in progress) course notes. Below we present descriptions of each lecture, as well as a link to the YouTube video. The slides, assignments, and notes can be found on the course website.
The background section introduces the course and also gives an overview of deep learning concepts that are relevant to AI safety work. It includes the following lectures:
This section includes a written assignment and a programming assignment designed to help students review deep learning concepts.
In Complex Systems for AI Safety, we discussed the systems view of safety. It’s unclear to what extent AI safety is like particular other safety problems, like making cars, planes, or software programs safer. The systems view of safety provides general abstract safety lessons that have been applicable across many different industries. Many of these industries, such as information security and the defense community, must contend with powerful adversarial actors, not unlike AI safety. The systems view of safety thus provides a good starting point for thinking about AI safety. The hazard analysis section of the course discusses foundational systems safety concepts and applies them to AI safety. It includes the following lectures:
This section includes a written assignment where students test their knowledge of the section.
In Open Problems in AI X-Risk, we covered the relevance of robustness to AI safety. Robustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We expect such events will be encountered frequently by future AI systems. This section includes the following lectures:
This section includes a written assignment where students test their knowledge of the section, and a programming assignment where students implement various methods in adversarial robustness.
We also covered monitoring in Open Problems in AI X-Risk, which we define as research that reduces exposure to hazards as much as possible and allows their identification before they grow. The course covers monitoring in more depth, and includes the following lectures:
This section includes a written assignment where students test their knowledge of the section, and two programming assignments: one focused on anomaly detection, and the other on Trojan detection.
We also cover alignment, which has varying definitions but we define as reducing inherent model hazards: hazards that result from models (explicitly or operationally) pursuing the wrong goals. The course covers the following areas, which are also covered in Open Problems in AI X-Risk.
This section includes a written assignment where students test their knowledge of the section, and a programming assignment where students use transparency tools to identify inconsistencies with language models trained to model ethics.
In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. This section covers a few of such areas, which are also covered in Open Problems in AI X-Risk.
As is typical for a topics course, the last section covers the broader importance of the concepts covered earlier: namely, existential risk and possible existential hazards. We also cover strategies or tractably reducing existential risk, following Pragmatic AI Safety and X-Risk Analysis For AI Research.
This section includes a final reflection assignment where students review the course and notably encourages students to evaluate AI safety arguments for themselves.
All course content is available online, so anyone can work through it on their own. The course is currently being trialed by the students in ML Safety Scholars, who are providing valuable feedback.
We are interested in running additional formal versions of this course in the future. If you have the operational capacity to run this course virtually, or are interested in running it at your university, please let us know!
If you notice bugs in the lectures and/or assignments, you can message any of us or email email@example.com.
Dan Hendrycks would like to thank Oliver Zhang, Rui Wang, Jason Ding, Steven Basart, Thomas Woodside, Nathaniel Li, and Joshua Clymer for helping with the design of the course, and the students in ML Safety Scholars for testing the course.
Just skimmed the course. One suggestion (will make more later): adding the Goal Misgeneralization paper from Langosco et al. as a core readings in the week on Detecting and Forecasting Emergent Behavior.
Roughly how many hours do you expect it takes to complete the course?
Thank you so much! While I've just skimmed the material, this seems like a really high-quality course.
One suggestion: complementing each section with a set of flashcards students could use for spaced repetition learning (e.g. an Anki deck). These could include both definitions of terms and explanations of concepts, with plenty of visuals. Normally, if I were to go through the course, I would write flashcards myself, but it would be faster for me and feel more complete if they were baked into the course.