BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Ard Louis (University of Oxford)
DTSTART:20210702T130000Z
DTEND:20210702T140000Z
DTSTAMP:20260423T021042Z
UID:MPML/50
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MPML/50/">De
 ep neural networks have an inbuilt Occam's razor</a>\nby Ard Louis (Univer
 sity of Oxford) as part of Mathematics\, Physics and Machine Learning (IST
 \, Lisbon)\n\n\nAbstract\nOne of the most surprising properties of deep ne
 ural networks (DNNs) is that they perform best in the overparameterized re
 gime. We are taught early on that having more parameters than data points 
 is a terrible idea. So why do DNNs work so well in a regime where classica
 l learning theory predicts they should heavily overfit? By adapting the co
 ding theorem from algorithmic information theory (which every physicist sh
 ould learn about!) we show that DNNs are exponentially biased at initialis
 ation to functions that have low descriptional (Kolmogorov) complexity. In
  other words\, DNNs have an inbuilt Occam's razor\, a bias towards simple 
 functions. We next show that stochastic gradient descent (SGD)\, the most 
 popular optimisation method for DNNs\, follows the same bias\, and so does
  not itself explain the good generalisation of DNNs. Our approach naturall
 y leads to a marginal-likelihood PAC-Bayes generalisation bound which perf
 orms better than any other bounds on the market. Finally\, we discuss why 
 this bias towards simplicity allows DNNs to perform so well\, and speculat
 e on what this may tell us about the natural world.\n
LOCATION:https://researchseminars.org/talk/MPML/50/
END:VEVENT
END:VCALENDAR
