https://arxiv.org/abs/1806.00952 gives a theoretical argument that suggests SGD will converge to a point that is very close in L2 norm to the initialization. Since NNs are often initialized with extremely small weights, this amounts to implicit L2 regularization.
My rough take: https://elicit.ought.org/builder/oTN0tXrHQ
3 buckets, similar to Ben Pace's
If I thought about this for 5 additional hours, I can imagine assigning the following ranges to the scenarios: