BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Gonçalo Correia (IST and Priberam Labs)
DTSTART:20230309T170000Z
DTEND:20230309T180000Z
DTSTAMP:20260603T003249Z
UID:MPML/102
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MPML/102/">L
 earnable Sparsity and Weak Supervision for Data-Efficient\, Transparent\, 
 and Compact Neural Models</a>\nby Gonçalo Correia (IST and Priberam Labs)
  as part of Mathematics\, Physics and Machine Learning (IST\, Lisbon)\n\n\
 nAbstract\nNeural network models have become ubiquitous in Machine Learnin
 g literature. These models are compositions of differentiable building blo
 cks that result in dense representations of the underlying data. To obtain
  good representations\, conventional neural models require many training d
 ata points. Moreover\, those representations\, albeit capable of obtaining
  a high performance on many tasks\, are largely uninterpretable. These mod
 els are often overparameterized and give out representations that do not c
 ompactly represent the data. To address these issues\, we find solutions i
 n sparsity and various forms of weak supervision. For data-efficiency\, we
  leverage transfer learning as a form of weak supervision. The proposed mo
 del can perform similarly to models trained on millions of data points on 
 a sequence-to-sequence generation task\, even though we only train it on a
  few thousand. For transparency\, we propose a probability normalizing fun
 ction that can learn its sparsity. The model learns the sparsity it needs 
 differentiably and thus adapts it to the data according to the neural comp
 onent's role in the overall structure. We show that the proposed model imp
 roves the interpretability of a popular neural machine translation archite
 cture when compared to conventional probability normalizing functions. Fin
 ally\, for compactness\, we uncover a way to obtain exact gradients of dis
 crete and structured latent variable models efficiently. The discrete node
 s in these models can compactly represent implicit clusters and structures
  in the data\, but training them was often complex and prone to failure si
 nce it required approximations that rely on sampling or relaxations. We pr
 opose to train these models with exact gradients by parameterizing discret
 e distributions with sparse functions\, both unstructured and structured. 
 We obtain good performance on three latent variable model applications whi
 le still achieving the practicality of the approximations mentioned above.
  Through these novel contributions\, we challenge the conventional wisdom 
 that neural models cannot exhibit data-efficiency\, transparency\, or comp
 actness.\n
LOCATION:https://researchseminars.org/talk/MPML/102/
END:VEVENT
END:VCALENDAR