Quadratic Reciprocity — AI Alignment Forum

As someone with limited knowledge of AI or alignment, I found this post accessible. There were times when I thought I knew vaguely what Nate meant but would not be able to explain it so I'm recording my confusions here to come back to when I've read up more. (If anyone wants to answer any of these r/NoStupidQuestions questions, that would be very helpful too).

"Your first problem is that the recent capabilities gains made by the AGI might not have come from gradient descent". This is something that comes up in response to a few of the plans. Is the idea that during training, for advanced enough AIs capabilities gains come from gradient descent and also through processing input / interacting with the world. Or is the second part only after it has finished training. What does that concretely look like in ML?
Is a lot of the disagreement about these plans just because of others finding the idea of a "sharp left turn" more unlikely than Nate or is there more agreement about that idea but the disagreement is about what proposals might give us a shot at solving it?
What might an ambitious interpretability agenda focused on the sharp left turn and the generalization problem look like besides just trying harder at interpretability?
Another explanation of the "sharp left turn" would also be really helpful to me. At the moment, it feels like I can only explain why that happens by using analogies to humans/apes rather than being able to give a clear explanation for why we should expect that by default, using ML/alignment language.

Two-year update on my personal AI timelines

Quadratic Reciprocity3y50

As a result, my timelines have also concentrated more around a somewhat narrower band of years. Previously, my probability increased from 10% to 60% over the course of the ~32 years from ~2032 and ~2064; now this happens over the ~24 years between ~2026 and ~2050.

10% probability by 2026 (!!)

On how various plans miss the hard bits of the alignment challenge

Quadratic Reciprocity4y173

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments