Two mathematical lessons of deep learning

Mikhail Belkin (Halicioğlu Data Science Institute, University of California San Diego)

28-Apr-2021, 17:00-18:00 (5 years ago)

Abstract: Recent empirical successes of deep learning have exposed significant gaps in our fundamental understanding of learning and optimization mechanisms. Modern best practices for model selection are in direct contradiction to the methodologies suggested by classical analyses. Similarly, the efficiency of SGD-based local methods used in training modern models, appeared at odds with the standard intuitions on optimization.

First, I will present evidence, empirical and mathematical, that necessitates revisiting classical notions, such as over-fitting. I will continue to discuss the emerging understanding of generalization, and, in particular, the "double descent" risk curve, which extends the classical U-shaped generalization curve beyond the point of interpolation.

Second, I will discuss why the landscapes of over-parameterized neural networks are generically never convex, even locally. Instead, as I will argue, they satisfy the Polyak-Lojasiewicz condition across most of the parameter space instead, which allows SGD-type methods to converge to a global minimum.

A key piece of the puzzle remains - how does optimization align with statistics to form the complete mathematical picture of modern ML?

data structures and algorithmsmachine learningmathematical physicsinformation theoryoptimization and controldata analysis, statistics and probability

Audience: researchers in the topic

( video )

Mathematics, Physics and Machine Learning (IST, Lisbon)

Series comments: To receive the series announcements, please register in:
mpml.tecnico.ulisboa.pt
mpml.tecnico.ulisboa.pt/registration
Zoom link: videoconf-colibri.zoom.us/j/91599759679

Organizers:	Mário Figueiredo, Tiago Domingos, Francisco Melo, Jose Mourao*, Cláudia Nunes, Yasser Omar, Pedro Alexandre Santos, João Seixas, Cláudia Soares, João Xavier
	*contact for this listing

Export talk to