Not only is realizability not guaranteed, it is extremely unrealistic because of the computational complexity of the real world. Furthermore, it is impossible for an agent to specify a hypothesis that has greater computational complexity than itself, which is the problem of irreflexivity.
It's unclear to me exactly when is irreflexivity an actual problem for a learner. I understand that a learner cannot simulate a process that is computationally more complex than the learner itself, but I'm unsure when an exact simulation is necessary for learning.
Consider...
Thanks, this clarifies many things! Thanks also for linking to your very comprehensive post on generalization.
To be clear, I didn't mean to claim that VC theory explains NN generalization. It is indeed famously bad at explaining modern ML. But "models have singularities and thus number of parameters is not a good complexity measure" is not a valid criticism of VC theory. If SLT indeed helps figure out the mysteries from the "understanding deep learning..." paper then that will be amazing!
...But what we'd really like to get at is an understanding of how pertur
Neural networks are intrinsically biased towards simpler solutions.
Am I correct in thinking that being "intrinsically biased towards simpler solutions" isn't a property of neural networks, but a property of the Bayesian learning procedure? The math in the post doesn't use much about NN's and it seems like the same conclusions can be drawn for any model class whose loss landscape has many minima with varying complexities?
Perhaps I have learnt statistical learning theory in a different order than others, but in my mind, the central theorem of statistical learning theory is that learning is characterized by the VC-dimension of your model class (here I mean learning in the sense of supervised binary classification, but similar dimensions exist for some more general kinds of learning as well). VC-dimension is a quantity that does not even mention the number of parameters used to specify your model, but depends only on the number of different behaviours induced by the models in...
I read the paper, and overall it's an interesting framework. One thing I am somewhat unconvinced about (likely because I have misunderstood something) is its utility despite the dependence on the world model. If we prove guarantees assuming a world model, but don't know what happens if the real world deviates from the world model, then we have a problem. Ideally perhaps we want a guarantee akin to what's proved in learning theory, for example, that the accuracy will be small for any data distribution as long as the distribution remains the same during trai...
I was going to say something similar. After reading the first two posts of the sequence I really thought the role of credal sets in defining regret would be somewhat different.
In particular, consider to be the classical (non-infra) regret for a given policy on a given environment . For a given environment class , we previously considered two notions of learnability depending on the kind of uncertainty we had over . First, under knightian uncertainty, we required that our policy satisfy , and under Bayesian uncertainty, we required .[1] A credal set gives... (read more)