This is an overview for advanced readers. Main post: Information Loss --> Basin flatness
Inductive bias is related to, among other things:
In relation to basin flatness and manifold dimension:
See the main post for details.
In standard terminology, G is the Jacobian of the concatenation of all outputs, w.r.t. the parameters.
N is the number of parameters in the model. See claims 1 and 2 here for a proof sketch.
Proof sketch for Rank(Hessian)=Rank(G):
There is an alternate proof going through the result Hessian=2GGT. (The constant 2 depends on MSE loss.)