This is a deep learning curriculum with a focus on topics relevant to large language model alignment. It is centered around papers and exercises, and is biased towards my own tastes.
It is targeted at newcomers to deep learning who are already familiar with the basics, but I expect it to be pretty challenging for most such people, especially without mentorship. So I have included some advice about prerequisites and more accessible alternatives.
Was definitely expecting this to be about curriculum learning :P
For the section on interpretability, I actually divide (modern-day, I'm less clear on the ambitious future) interpretability into several prongs, and am curious what you think:
I've not mentally carved things up that way before, but they do seem like different flavors of work (with 1 and 2 being closely related).
Another distinction I sometimes consider is between exploring a network for interpretable pieces ("finding things we understand") versus trying to exhaustively interpret part of a network ("finding things we don't understand"). But this distinction doesn't carve up existing work very evenly: the only thing I can think of that I'd put in the latter category is the work on Artificial Artificial Neural Networks.
That seems like a pretty reasonable breakdown (though note that 2 inherently needs to come after 1)