Renormalizing the Optimal Hyperparameters of a Neural Network

Greg Yang (Microsoft Research)

22-Mar-2022, 18:30-19:30 (2 years ago)

Abstract: Hyperparameter tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters that often can only be trained once. We show that, in the recently discovered Maximal Update Parametrization (μP), many optimal hyperparameters remain stable even as model size changes. Using this insight, for example, we are able to re-tune the 6.7-billion-parameter model of GPT-3 and obtain performance comparable to the 13-billion-parameter model of GPT-3, effectively doubling the model size.

In this context, there is a rich analogy we can make to Wilsonian effective field theory. For example, if “coupling constants” in physics correspond to “optimal hyperparameters” in deep learning and “cutoff scale” corresponds to “model size”, then we can say “μP is a renormalizable theory of neural networks.” We finish by formulating the question of whether there is a “Grand Unifying Theory” of neural networks at scale that can inform our quest toward general intelligence.

HEP - phenomenologyHEP - theorymathematical physics

Audience: researchers in the topic


NHETC Seminar

Series comments: Description: Weekly research seminar of the NHETC at Rutgers University

Livestream link is available on the webpage.

Organizers: Christina Pettola*, Sung Hak Lim, Vivek Saxena*, Erica DiPaola*
*contact for this listing

Export talk to