BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Greg Yang (Microsoft Research)
DTSTART:20220322T183000Z
DTEND:20220322T193000Z
DTSTAMP:20260423T005742Z
UID:nhetc/34
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/nhetc/34/">R
 enormalizing the Optimal Hyperparameters of a Neural Network</a>\nby Greg 
 Yang (Microsoft Research) as part of NHETC Seminar\n\n\nAbstract\nHyperpar
 ameter tuning in deep learning is an expensive process\, prohibitively so 
 for neural networks (NNs) with billions of parameters that often can only 
 be trained once. We show that\, in the recently discovered Maximal Update 
 Parametrization (μP)\, many optimal hyperparameters remain stable even as
  model size changes.  Using this insight\, for example\, we are able to re
 -tune the 6.7-billion-parameter model of GPT-3 and obtain performance comp
 arable to the 13-billion-parameter model of GPT-3\, effectively doubling t
 he model size.\n\nIn this context\, there is a rich analogy we can make to
  Wilsonian effective field theory. For example\, if “coupling constants
 ” in physics correspond to “optimal hyperparameters” in deep learnin
 g and “cutoff scale” corresponds to “model size”\, then we can say
  “μP is a renormalizable theory of neural networks.” We finish by for
 mulating the question of whether there is a “Grand Unifying Theory” of
  neural networks at scale that can inform our quest toward general intelli
 gence.\n
LOCATION:https://researchseminars.org/talk/nhetc/34/
END:VEVENT
END:VCALENDAR
