activatedgeek

Posts

Sorted by New

Comments

Understanding “Deep Double Descent”

I want to point out some recent work by Andrew Gordon Wilson's group - https://cims.nyu.edu/~andrewgw/#papers.

Particularly, https://arxiv.org/abs/2003.02139 takes a look a double descent from the perspective where they argue that parameters are a bad proxy of model complexity/capacity. Rather, effective dimensionality is what we should be plotting against and double descent effectively vanishes (https://arxiv.org/abs/2002.08791) when we use Bayesian model averaging instead of point estimates.