BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:André F. T. Martins (Instituto Superior Técnico)
DTSTART:20220224T163000Z
DTEND:20220224T173000Z
DTSTAMP:20260423T003238Z
UID:MPML/69
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MPML/69/">Fr
 om Sparse Modeling to Sparse Communication</a>\nby André F. T. Martins (I
 nstituto Superior Técnico) as part of Mathematics\, Physics and Machine L
 earning (IST\, Lisbon)\n\n\nAbstract\nNeural networks and other machine le
 arning models compute continuous representations\, while humans communicat
 e mostly through discrete symbols. Reconciling these two forms of communic
 ation is desirable for generating human-readable interpretations or learni
 ng discrete latent variable models\, while maintaining end-to-end differen
 tiability.\n\nIn the first part of the talk\, I will describe how sparse m
 odeling techniques can be extended and adapted for facilitating sparse com
 munication in neural models. The building block is a family of sparse tran
 sformations called alpha-entmax\, a drop-in replacement for softmax\, whic
 h contains sparsemax as a particular case. Entmax transformations are diff
 erentiable and (unlike softmax) they can return sparse probability distrib
 utions\, useful to build interpretable attention mechanisms. Variants of t
 hese sparse transformations have been applied with success to machine tran
 slation\, natural language inference\, visual question answering\, and oth
 er tasks.\n\nIn the second part\, I will introduce mixed random variables\
 , which are in-between the discrete and continuous worlds. We build rigoro
 us theoretical foundations for these hybrids\, via a new “direct sum” 
 base measure defined on the face lattice of the probability simplex. From 
 this measure\, we introduce new entropy and Kullback-Leibler divergence fu
 nctions that subsume the discrete and differential cases and have interpre
 tations in terms of code optimality. Our framework suggests two strategies
  for representing and sampling mixed random variables\, an extrinsic (“s
 ample-and-project”) and an intrinsic one (based on face stratification).
 \n\nIn the third part\, I will show how sparse transformations can also be
  used to design new loss functions\, replacing the cross-entropy loss. To 
 this end\, I will introduce the family of Fenchel-Young losses\, revealing
  connections between generalized entropy regularizers and separation margi
 n. I will illustrate with applications in natural language generation\, mo
 rphology\, and machine translation.\n\nThis work was funded by the DeepSPI
 N ERC project - https://deep-spin.github.io\n
LOCATION:https://researchseminars.org/talk/MPML/69/
END:VEVENT
END:VCALENDAR
