Understanding “Deep Double Descent”

I want to point out some recent work by Andrew Gordon Wilson's group -

Particularly, takes a look a double descent from the perspective where they argue that parameters are a bad proxy of model complexity/capacity. Rather, effective dimensionality is what we should be plotting against and double descent effectively vanishes ( when we use Bayesian model averaging instead of point estimates.