This curriculum, a follow-up to the Alignment Fundamentals curriculum (the ‘101’ to this 201 curriculum), aims to give participants enough knowledge about alignment to understand the frontier of current research discussions. It assumes that participants have read through the Alignment Fundamentals curriculum, taken a course on deep learning, and taken a course on reinforcement learning (or have an equivalent level of knowledge).
Although these are the basic prerequisites, we expect that most people who intend to work on alignment should only read through the full curriculum after they have significantly more ML experience than listed above, since upskilling via their own ML engineering or research projects should generally be a higher priority for early-career alignment researchers.
When reading this curriculum, it’s worth remembering that the field of alignment aims to shape the goals of systems that don’t yet exist; and so alignment research is often more speculative than research in other fields. You shouldn’t assume that there’s a consensus about the usefulness of any given research direction; instead, it’s often worth developing your own views about whether techniques discussed in this curriculum might plausibly scale up to help align AGI.
The curriculum was compiled, and is maintained, by Richard Ngo. For now, it’s primarily intended to be read independently; once we’ve run a small pilot program, we’ll likely extend it to a discussion-based course.
Week 1: Further understanding the problemWeek 2: Decomposing tasks for better supervisionWeek 3: Preventing misgeneralizationWeek 4: InterpretabilityWeek 5: Reasoning about ReasoningWeeks 6 & 7 (Track 1): Eliciting Latent KnowledgeWeeks 6 & 7 (Track 2): Agent FoundationsWeeks 6 & 7 (Track 3): Science of Deep LearningWeeks 8 & 9: Literature Review or Project Proposal
See the full curriculum here. Note that the curriculum is still under revision, and feedback is very welcome!
I'd recommend replacing the older Does SGD Produce Deceptive Alignment? with the newer How likely is deceptive alignment?.
I considered this, but it seems like the latter is 4x longer while covering fairly similar content?
I've found that people often really struggle to understand the content from the former but got it when I gave them the latter—and also I think the latter post covers a lot of newer stuff that's not in the old one (e.g. different models of inductive biases).