Produced under the mentorship of Evan Hubinger as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort.

Thank you to @Zach Furman, @Denizhan_Akar, and Mark Chiu Chong for feedback and discussion.

Classical learning theory is flawed.

At least part of this is due to the field's flimsy theoretical foundations. In transitioning to modeling functions rather than probability distributions and in making risk the fundamental object of study rather than the data-generating distribution, learning theory has become easier to implement but harder to think clearly about.

This problem is not quite as big a problem as the other issues I'll raise in this sequence — the theory is not more limited than it would be otherwise. But the machinery obfuscates the true probabilistic nature of learning, throws out too much information about the microscopic structure of the model class, and misdirects precious attention to bounds that yield little insight.

Ultimately, a poor theoretical framework makes for poor theoretical work. We can do better by taking inspiration from statistical physics and singular learning theory.

Learning theorists decompose risk into **approximation** (how good is the best model in your model class), **generalization** (how close is your model's risk on the dataset to its expectation value), and **optimization** (how close is the model you obtain to the best model in the model class).

Empirical risk minimization

In order to understand what classical learning theory is and what it aims to accomplish, we first need to understand the underlying framework of empirical risk minimization (ERM).

This framework is about learning functions,

f : X \to Y .

In particular, we want to find the "optimal" function, $f^{*}$ , within some model/hypothesis class, $F \subset Y^{X}$ , where we define the "optimal" function as that which minimizes the (population) risk,

R : Y^{X} \to R .

The population risk evaluates $f$ on each input-output pair via a loss function, $ℓ : Y \times Y \to R$ ,

R (f) = \int ℓ (f (x), y) d μ (x, y),

with respect to some measure $d μ (x, y)$ over the joint space, $X \times Y$ . That is, $ℓ$ is a notion of similarity that penalizes $f$ for being far from the target $y$ .

In practice, this integral is intractable^[1], so instead we approximate $R$ via the empirical risk,

R_{n} (f) = \frac{1}{n} n \sum i = 1 ℓ (f (x_{n}), y_{n}),

where the sum is taken over a dataset,&n...