Mark Xu

# Posts

Sorted by New

Understanding “Deep Double Descent”

https://arxiv.org/abs/1806.00952 gives a theoretical argument that suggests SGD will converge to a point that is very close in L2 norm to the initialization. Since NNs are often initialized with extremely small weights, this amounts to implicit L2 regularization.

My rough take: https://elicit.ought.org/builder/oTN0tXrHQ

3 buckets, similar to Ben Pace's

1. 5% chance that current techniques just get us all the way there, e.g. something like GPT-6 is basically AGI
2. 10% chance AGI doesn't happen this century, e.g. humanity sort of starts taking this seriously and decides we ought to hold off + the problem being technically difficult enough that small groups can't really make AGI themselves
3. 50% chance that something like current techniques and some number of new insights gets us to AGI.