BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Tom Jacobs (CISPA Helmholtz Center for Information Security)
DTSTART:20260209T080000Z
DTEND:20260209T090000Z
DTSTAMP:20260423T153345Z
UID:TropicalmathandML/32
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/Tropicalmath
 andML/32/">Weight Decay Controls Implicit Regularization:  Insights on Gen
 eralization and Sparsity</a>\nby Tom Jacobs (CISPA Helmholtz Center for In
 formation Security) as part of Tropical mathematics and machine learning\n
 \n\nAbstract\nClassical statistics teaches us that overparameterization ca
 uses overfitting\, which prevents good generalization. However\, highly ov
 erparameterized neural network architectures generalize surprisingly well.
  This is because the training of these models tends towards low rank or sp
 arse solutions\, without requiring explicit constraints. This preference i
 s known as implicit regularization\, and it can be found in a variety of c
 ontexts\, including attention layers\, LoRA\, matrix sensing\, and diagona
 l linear networks. As a result\, implicit regularization helps explain how
  overfitting is avoided and generalization is improved in neural networks.
 \n\nIn this work I will show how weight decay controls implicit regulariza
 tion beyond its explicit role of constraining the model capacity. For inst
 ance\, it moves the implicit regularizer from $L_2$ to $L_1$\, which leads
  to more sparsity in the model. This demonstrates how weight decay not onl
 y serves as a model constraint\, but also has an implicit effect. By turni
 ng off weight decay during training\, only the implicit effect remains\, r
 esulting in better generalization overall. Besides better generalization\,
  I use these insights to induce sparsity in deep neural networks. Sparsity
  aims to reduce model size and inference time by removing as many weights 
 as possible. This results in a new method: PILoT (Parameteric Implicit Lot
 tery Ticket)(Our previous work)\, a sparsification method based on overpar
 ameterization and weight decay that uses the transition of the implicit re
 gularization from $L_2$ to $L_1$ to gradually sparsify\, achieving high sp
 arsity with a smaller performance drop.\n\nTheoretically\, we use and exte
 nd the connection between reparameterizations (specific overparameterizati
 on) and mirror flows (Riemannian gradient flow) and extend this to time-va
 rying mirror flows. The mirror flow controls the implicit bias and with th
 at the weight decay controls the time-varying mirror flow.\n
LOCATION:https://researchseminars.org/talk/TropicalmathandML/32/
END:VEVENT
END:VCALENDAR
