The generalization error of overparametrized models: Insights from exact asymptotics
Andrea Montanari (Stanford)
Abstract: In a canonical supervised learning setting, we are given n data samples, each comprising a feature vector and a label, or response variable. We are asked to learn a function f that can predict the the label associated to a new –unseen– feature vector. How is it possible that the model learnt from observed data generalizes to new points? Classical learning theory assumes that data points are drawn i.i.d. from a common distribution and argue that this phenomenon is a consequence of uniform convergence: the training error is close to its expectation uniformly over all models in a certain class. Modern deep learning systems appear to defy this viewpoint: they achieve training error that is significantly smaller than the test error, and yet generalize well to new data. I will present a sequence of high-dimensional examples in which this phenomenon can be understood in detail. [Based on joint work with Song Mei, Feng Ruan, Youngtak Sohn, Jun Yan]
optimization and controlstatistics theory
Audience: researchers in the topic
Series comments: Description: Research seminar on data science
See here for Zoom links to individual seminars, links to recordings, and to subscribe to calendar or mailing list.
| Organizers: | Afonso S. Bandeira*, Joan Bruna, Carlos Fernandez-Granda, Jonathan Niles-Weed, Ilias Zadik |
| *contact for this listing |
