BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Philippe Rigollet (MIT)
DTSTART:20200408T140000Z
DTEND:20200408T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/1
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/1/">
 Statistical and Computational aspects of Wasserstein Barycenters</a>\nby P
 hilippe Rigollet (MIT) as part of MAD+\n\n\nAbstract\nThe notion of averag
 e is central to most statistical methods. In this talk we study a generali
 zation of this notion over the non-​Euclidean space of probability measu
 res equipped with a certain Wasserstein distance. This generalization is o
 ften called Wasserstein Barycenters and empirical evidence suggests that t
 hese barycenters allow to capture interesting notions of averages in graph
 ics\, data assimilation and morphometrics. However the statistical (rates 
 of convergence) and computational (efficient algorithms) for these Wassers
 tein barycenters are largely unexplored. The goal of this talk is to revie
 w two recent results: 1. Fast rates of convergence for empirical barycente
 rs in general geodesic spaces\, and\, 2. Provable guarantees for gradient 
 descent and stochastic gradient descent to compute Wasserstein barycenters
 . Both results leverage geometric aspects of optimal transport. Based on j
 oint works (arXiv:1908.00828\, arXiv:2001.01700) with Chewi\, Le Gouic\, M
 aunu\, Paris\, and Stromme.\n
LOCATION:https://researchseminars.org/talk/MADPlus/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Francis Bach (INRIA/ENS)
DTSTART:20200520T140000Z
DTEND:20200520T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/2
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/2/">
 On the effectiveness of Richardson extrapolation in machine learning</a>\n
 by Francis Bach (INRIA/ENS) as part of MAD+\n\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/MADPlus/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:David Gamarnik (MIT)
DTSTART:20200422T140000Z
DTEND:20200422T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/3
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/3/">
 Overlap gap property: a provable barrier to fast optimization in probabili
 stic combinatorial structures</a>\nby David Gamarnik (MIT) as part of MAD+
 \n\n\nAbstract\nMany combinatorial optimization problems defined on random
  instances exhibit an apparent gap between the optimal values\, which can 
 be computed by non-constructive means\, and the best values achievable by 
 fast (polynomial time) algorithms. Through a combined effort of mathematic
 ians\, computer scientists and statistical physicists\, it became apparent
  that a potential barrier for designing fast algorithms bridging this gap 
 is an intricate topology of nearly optimal solutions\, in particular the p
 resence of the Overlap Gap Property (OGP)\, which we will introduce in thi
 s talk. We will discuss how for many such problems the onset of the OGP ph
 ase transition introduces indeed a provable barrier to a broad class of po
 lynomial time algorithms. Examples of such problems include the problem of
  finding a largest independent set of a random graph\, finding a largest c
 ut in a random hypergrah\, the problem of finding a ground state of a p-sp
 in model\, and also many problems in high-dimensional statistics field. In
  this talk we will demonstrate in particular why OGP is a barrier for thre
 e classes of algorithms designed to find a near ground state in p-spin mod
 els arising in the field of spin glass theory: Approximate Message Passing
  algorithms\, algorithms based on low-degree polynomial and Langevin dynam
 ics. Joint work with Aukosh Jagannath and Alex Wein\n
LOCATION:https://researchseminars.org/talk/MADPlus/3/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lenka Zdeborova (CNRS)
DTSTART:20200527T140000Z
DTEND:20200527T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/4
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/4/">
 Understanding machine learning via exactly solvable statistical physics mo
 dels</a>\nby Lenka Zdeborova (CNRS) as part of MAD+\n\n\nAbstract\nThe aff
 inity between statistical physics and machine learning has a long history\
 , this is reflected even in the machine learning terminology that is in pa
 rt adopted from physics. I will describe the main lines of this long-lasti
 ng friendship in the context of current theoretical challenges and open qu
 estions about deep learning. Theoretical physics often proceeds in terms o
 f solvable synthetic models\, I will describe the related line of work on 
 solvable models of simple feed-forward neural networks. I will highlight a
  path forward to capture the subtle interplay between the structure of the
  data\, the architecture of the network\, and the learning algorithm.\n
LOCATION:https://researchseminars.org/talk/MADPlus/4/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Ingrid Daubechies (Duke)
DTSTART:20200603T140000Z
DTEND:20200603T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/5
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/5/">
 Diffusion Methods in Manifold and Fibre Bundle Learning</a>\nby Ingrid Dau
 bechies (Duke) as part of MAD+\n\n\nAbstract\nDiffusion methods help under
 stand and denoise data sets\; when there is additional structure (as is of
 ten the case)\, one can use (and get additional benefit from) a fiber bund
 le model.\n
LOCATION:https://researchseminars.org/talk/MADPlus/5/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Sara van de Geer (ETHZ)
DTSTART:20200429T140000Z
DTEND:20200429T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/6
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/6/">
 Total variation regularization</a>\nby Sara van de Geer (ETHZ) as part of 
 MAD+\n\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/MADPlus/6/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Andrea Montanari (Stanford)
DTSTART:20200610T140000Z
DTEND:20200610T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/7
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/7/">
 The generalization error of overparametrized models: Insights from exact a
 symptotics</a>\nby Andrea Montanari (Stanford) as part of MAD+\n\n\nAbstra
 ct\nIn a canonical supervised learning setting\, we are given n data sampl
 es\, each comprising a feature vector and a label\, or response variable. 
 We are asked to learn a function f that can predict the the label associat
 ed to a new –unseen– feature vector. How is it possible that the model
  learnt from observed data generalizes to new points? Classical learning t
 heory assumes that data points are drawn i.i.d. from a common distribution
  and argue that this phenomenon is a consequence of uniform convergence: t
 he training error is close to its expectation uniformly over all models in
  a certain class. Modern deep learning systems appear to defy this viewpoi
 nt: they achieve training error that is significantly smaller than the tes
 t error\, and yet generalize well to new data. I will present a sequence o
 f high-dimensional examples in which this phenomenon can be understood in 
 detail. [Based on joint work wit\n
LOCATION:https://researchseminars.org/talk/MADPlus/7/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Emmanuel Candes (Stanford)
DTSTART:20200512T180000Z
DTEND:20200512T190000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/8
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/8/">
 Reliable predictions? Equitable treatment? Some recent progress in predict
 ive inference</a>\nby Emmanuel Candes (Stanford) as part of MAD+\n\n\nAbst
 ract\nRecent progress in machine learning (ML) provides us with many poten
 tially effective tools to learn from datasets of ever increasing sizes and
  make useful predictions. How do we know that these tools can be trusted i
 n critical and high-sensitivity systems? If a learning algorithm predicts 
 the GPA of a prospective college applicant\, what guarantees do I have con
 cerning the accuracy of this prediction? How do we know that it is not bia
 sed against certain groups of applicants? This talk introduces statistical
  ideas to ensure that the learned models satisfy some crucial properties\,
  especially reliability and fairness (in the sense that the models need to
  apply to individuals in an equitable manner). To achieve these important 
 objectives\, we shall not ‘open up the black box’ and try understandin
 g its underpinnings. Rather we discuss broad methodologies — conformal i
 nference\, quantile regression\, the Jackknife+ — that can be wrapped ar
 ound any black box as to produce results that can be trusted and are equit
 able.\n
LOCATION:https://researchseminars.org/talk/MADPlus/8/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Aviv Regev (Broad Institute\, MIT/Harvard)
DTSTART:20200617T140000Z
DTEND:20200617T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/9
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/9/">
 Design for Inference and the power of random experiments in biology</a>\nb
 y Aviv Regev (Broad Institute\, MIT/Harvard) as part of MAD+\n\nAbstract: 
 TBA\n
LOCATION:https://researchseminars.org/talk/MADPlus/9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Andrea Montanari (Stanford)
DTSTART:20200624T140000Z
DTEND:20200624T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/10
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/10/"
 >The generalization error of overparametrized models: Insights from exact 
 asymptotics</a>\nby Andrea Montanari (Stanford) as part of MAD+\n\n\nAbstr
 act\nIn a canonical supervised learning setting\, we are given n data samp
 les\, each comprising a feature vector and a label\, or response variable.
  We are asked to learn a function f that can predict the the label associa
 ted to a new –unseen– feature vector. How is it possible that the mode
 l learnt from observed data generalizes to new points? Classical learning 
 theory assumes that data points are drawn i.i.d. from a common distributio
 n and argue that this phenomenon is a consequence of uniform convergence: 
 the training error is close to its expectation uniformly over all models i
 n a certain class. Modern deep learning systems appear to defy this viewpo
 int: they achieve training error that is significantly smaller than the te
 st error\, and yet generalize well to new data. I will present a sequence 
 of high-dimensional examples in which this phenomenon can be understood in
  detail. [Based on joint work with Song Mei\, Feng Ruan\, Youngtak Sohn\, 
 Jun Yan]\n
LOCATION:https://researchseminars.org/talk/MADPlus/10/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mahdi Soltanolkotabi (USC)
DTSTART:20200708T140000Z
DTEND:20200708T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/11
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/11/"
 >Learning via early stopping and untrained neural nets</a>\nby Mahdi Solta
 nolkotabi (USC) as part of MAD+\n\n\nAbstract\nModern neural networks are 
 typically trained in an over-parameterized regime where the parameters of 
 the model far exceed the size of the training data. Such neural networks i
 n principle have the capacity to (over)fit any set of labels including sig
 nificantly corrupted ones. Despite this (over)fitting capacity\, over-para
 meterized networks have an intriguing robustness capability: they are surp
 risingly robust to label noise when first order methods with early stoppin
 g are used to train them. Even more surprising\, one can remove noise and 
 corruption from a natural image without using any training data what-so-ev
 er\, by simply fitting (via gradient descent) a randomly initialized\, ove
 r-parameterized convolutional generator to a single corrupted image. In th
 is talk I will first present theoretical results aimed at explaining the r
 obustness capability of neural networks when trained via early-stopped gra
 dient descent. I will then present results towards demystifying untrained 
 networks for image reconstruction/restoration tasks such as denoising and 
 those arising in inverse problems such as compressive sensing.\n
LOCATION:https://researchseminars.org/talk/MADPlus/11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Samory Kpotufe (Columbia)
DTSTART:20200715T140000Z
DTEND:20200715T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/12
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/12/"
 >Some recent insights on transfer-learning</a>\nby Samory Kpotufe (Columbi
 a) as part of MAD+\n\n\nAbstract\nA common situation in Machine Learning i
 s one where training data is not fully representative of a target populati
 on due to bias in the sampling mechanism or high costs in sampling the tar
 get population\; in such situations\, we aim to ’transfer’ relevant in
 formation from the training data (a.k.a. source data) to the target applic
 ation. How much information is in the source data? How much target data sh
 ould we collect if any? These are all practical questions that depend cruc
 ially on ‘how far’ the source domain is from the target. However\, how
  to properly measure ‘distance’ between source and target domains rema
 ins largely unclear.\n\nIn this talk we will argue that much of the tradit
 ional notions of ‘distance’ (e.g. KL-divergence\, extensions of TV suc
 h as D_A discrepancy\, density-ratios\, Wasserstein distance) can yield an
  over-pessimistic picture of transferability. Instead\, we show that some 
 new notions of ‘relative dimension’ between source and target (which w
 e simply term ‘transfer-exponents’) capture a continuum from easy to h
 ard transfer. Transfer-exponents uncover a rich set of situations where tr
 ansfer is possible even at fast rates\, encode relative benefits of source
  and target samples\, and have interesting implications for related proble
 ms such as multi-task or multi-source learning.\n\nIn particular\, in the 
 case of multi-source learning\, we will discuss (if time permits) a strong
  dichotomy between minimax and adaptive rates: no adaptive procedure can a
 chieve a rate better than single source rates\, although minimax (oracle) 
 procedures can.\n\nThe talk is based on earlier work with Guillaume Martin
 et\, and ongoing work with Steve Hanneke.\n
LOCATION:https://researchseminars.org/talk/MADPlus/12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Guilio Biroli (ENS Paris)
DTSTART:20200729T140000Z
DTEND:20200729T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/13
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/13/"
 >On the benefit of over-parametrization and the origin of double descent c
 urves in artificial neural networks</a>\nby Guilio Biroli (ENS Paris) as p
 art of MAD+\n\n\nAbstract\nDeep neural networks have triggered a revolutio
 n in machine learning\, and more generally in computer science. Understand
 ing their remarkable performance is a key scientific challenge with many o
 pen questions. For instance\, practitioners find that using massively over
 -parameterised networks is beneficial to learning and generalization abili
 ty. This fact goes against standard theories\, and defies intuition. In th
 is talk I will address this issue. I will first contrast standard expectat
 ions based on variance-bias trade-off to the results of numerical experime
 nts on deep neural networks\, which display a “double-descent” behavio
 r of the test error when increasing the number of parameters instead of th
 e traditional U-curve. I will then discuss a theory of this phenomenon bas
 ed on the solution of simplified models of deep neural networks by statist
 ical physics methods.\n
LOCATION:https://researchseminars.org/talk/MADPlus/13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Ahmed El Alaoui (Stanford)
DTSTART:20200722T140000Z
DTEND:20200722T150000Z
DTSTAMP:20260422T212553Z
UID:MADPlus/14
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MADPlus/14/"
 >Optimization of mean-field spin glass Hamiltonians</a>\nby Ahmed El Alaou
 i (Stanford) as part of MAD+\n\n\nAbstract\nWe consider the question of co
 mputing an approximate ground state configuration of an Ising (mixed) p-sp
 in Hamiltonian H_N from a bounded number of gradient evaluations.\n\nI wil
 l present an efficient algorithm which exploits the ultrametric structure 
 of the superlevel sets of H_N in order to achieve an energy E_* characteri
 zed via an extended Parisi variational principle. This energy E_* is optim
 al when the model satisfies a `no overlap gap’ condition. At the heart o
 f this algorithmic approach is a stochastic control problem\, whose dual t
 urns out to be the Parisi formula\, thereby shedding new light on the natu
 re of the latter.\n\nThis is joint work with Andrea Montanari and Mark Sel
 lke.\n
LOCATION:https://researchseminars.org/talk/MADPlus/14/
END:VEVENT
END:VCALENDAR
