A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Net
Rong Ge (Duke University)
Abstract: The training of neural networks optimizes complex non-convex objective functions, yet in practice simple algorithms achieve great performances. Recent works suggest that over-parametrization could be a key ingredient in explaining this discrepancy. However, current theories could not fully explain the role of over-parameterization. In particular, they either work in a regime where neurons don't move much, or require large number of neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parametrized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0. Our result holds for any number of student neurons as long as it's at least as large as the number of teacher neurons, and gives explicit bounds on convergence rates that is independent of the number of student neurons. Based on joint work with Mo Zhou and Chi Jin.
statistics theory
Audience: researchers in the topic
Stochastics and Statistics Seminar Series
Series comments: Description: MIT seminar on statistics, data science and related topics
| Organizers: | Philippe Rigollet*, Sasha Rakhlin |
| *contact for this listing |
