On the convergence of gradient descent for wide two-layer neural networks

Francis Bach (Inria, FR)

18-May-2020, 13:00-13:45 (6 years ago)

Abstract: Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Based on joint work with Lénaïc Chizat, arxiv.org/abs/1805.09545, arxiv.org/abs/2002.04486)

analysis of PDEsfunctional analysisgeneral mathematicsnumerical analysisoptimization and controlprobabilitystatistics theory

Audience: researchers in the topic

Comments: Please note that this is a joint talk with the One World Optimization Seminar.


One World seminar: Mathematical Methods for Arbitrary Data Sources (MADS)

Series comments: Description: Research seminar on mathematics for data

The lecture series will collect talks on mathematical disciplines related to all kind of data, ranging from statistics and machine learning to model-based approaches and inverse problems. Each pair of talks will address a specific direction, e.g., a NoMADS session related to nonlocal approaches or a DeepMADS session related to deep learning.

Approximately 15 minutes prior to the beginning of the lecture, a zoom link will be provided on the official website and via mailing list. For further details please visit our webpage.

Organizers: Leon Bungert*, Martin Burger, Antonio Esposito*, Janic Föcke, Daniel Tenbrinck, Philipp Wacker
*contact for this listing

Export talk to