Mark Xu

Comments

Understanding “Deep Double Descent”

https://arxiv.org/abs/1806.00952 gives a theoretical argument that suggests SGD will converge to a point that is very close in L2 norm to the initialization. Since NNs are often initialized with extremely small weights, this amounts to implicit L2 regularization. 

Forecasting Thread: AI Timelines

My rough take: https://elicit.ought.org/builder/oTN0tXrHQ
 

3 buckets, similar to Ben Pace's 

  1. 5% chance that current techniques just get us all the way there, e.g. something like GPT-6 is basically AGI
  2. 10% chance AGI doesn't happen this century, e.g. humanity sort of starts taking this seriously and decides we ought to hold off + the problem being technically difficult enough that small groups can't really make AGI themselves
  3. 50% chance that something like current techniques and some number of new insights gets us to AGI. 

If I thought about this for 5 additional hours, I can imagine assigning the following ranges to the scenarios:

  1. [1, 25]
  2. [1, 30]
  3. [20, 80]