A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net

Rong Ge (Duke University)

04-Dec-2020, 16:05-17:05 (5 years ago)

Abstract: The training of neural networks optimizes complex non-convex objective functions, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However, current theories could not fully explain the role of over-parameterization. In particular, they either work in a regime where neurons don't move much, or require large number of neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parametrized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0. Our result holds for any number of student neurons as long as it's at least as large as the number of teacher neurons, and gives explicit bounds on convergence rates that is independent of the number of student neurons. Based on joint work with Mo Zhou and Chi Jin.

statistics theory

Audience: researchers in the topic

Stochastics and Statistics Seminar Series

Series comments: Description: MIT seminar on statistics, data science and related topics

Organizers:	Philippe Rigollet*, Sasha Rakhlin
	*contact for this listing

Export talk to