BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Maxim Raginsky (University of Illinois Urbana-Champaign)
DTSTART;VALUE=DATE-TIME:20200519T160000Z
DTEND;VALUE=DATE-TIME:20200519T173000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/1
DESCRIPTION:Title: Ne
ural SDEs: deep generative models in the diffusion limit\nby Maxim Rag
insky (University of Illinois Urbana-Champaign) as part of IAS Seminar Ser
ies on Theoretical Machine Learning\n\n\nAbstract\nIn deep generative mode
ls\, the latent variable is generated by a time-inhomogeneous Markov chain
\, where at each time step we pass the current state through a parametric
nonlinear map\, such as a feedforward neural net\, and add a small indepen
dent Gaussian perturbation. In this talk\, based on joint work with Belind
a Tzen\, I will discuss the diffusion limit of such models\, where we incr
ease the number of layers while sending the step size and the noise varian
ce to zero. I will first provide a unified viewpoint on both sampling and
variational inference in such generative models through the lens of stocha
stic control. Then I will show how we can quantify the expressiveness of d
iffusion-based generative models. Specifically\, I will prove that one can
efficiently sample from a wide class of terminal target distributions by
choosing the drift of the latent diffusion from the class of multilayer fe
edforward neural nets\, with the accuracy of sampling measured by the Kull
back-Leibler divergence to the target distribution. Finally\, I will brief
ly discuss a scheme for unbiased\, finite-variance simulation in such mode
ls. This scheme can be implemented as a deep generative model with a rando
m number of layers.\n
LOCATION:https://researchseminars.org/talk/IASML/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Roni Rosenfeld (Carnegie Mellon University)
DTSTART;VALUE=DATE-TIME:20200521T190000Z
DTEND;VALUE=DATE-TIME:20200521T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/2
DESCRIPTION:Title: Fo
recasting epidemics and pandemics\nby Roni Rosenfeld (Carnegie Mellon
University) as part of IAS Seminar Series on Theoretical Machine Learning\
n\n\nAbstract\nEpidemiological forecasting is critically needed for decisi
on making by national and local governments\, public health officials\, he
althcare institutions and the general public. The Delphi group at Carnegie
Mellon University was founded in 2012 to advance the theory and technolog
ical capability of epidemiological forecasting\, and to promote its role i
n decision making\, both public and private. Our long term vision is to ma
ke epidemiological forecasting as useful and universally accepted as weath
er forecasting is today. I will describe some of the methods we developed
over the past eight year for forecasting flu\, dengue and other epidemics\
, and the challenges we faced in adapting these method to the COVID pandem
ic in the past few months.\n
LOCATION:https://researchseminars.org/talk/IASML/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Aleksander Madry (MIT)
DTSTART;VALUE=DATE-TIME:20200609T162000Z
DTEND;VALUE=DATE-TIME:20200609T175000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/4
DESCRIPTION:Title: Wh
at do our models learn?\nby Aleksander Madry (MIT) as part of IAS Semi
nar Series on Theoretical Machine Learning\n\n\nAbstract\nLarge-scale visi
on benchmarks have driven---and often even defined---progress in machine l
earning. However\, these benchmarks are merely proxies for the real-world
tasks we actually care about. How well do our benchmarks capture such task
s?\n\nIn this talk\, I will discuss the alignment between our benchmark-dr
iven ML paradigm and the real-world uses cases that motivate it. First\, w
e will explore examples of biases in the ImageNet dataset\, and how state-
of-the-art models exploit them. We will then demonstrate how these biases
arise as a result of design choices in the data collection and curation pr
ocesses.\n\nBased on joint works with Logan Engstrom\, Andrew Ilyas\, Shib
ani Santurkar\, Jacob Steinhardt\, Dimitris Tsipras and Kai Xiao.\n
LOCATION:https://researchseminars.org/talk/IASML/4/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Michael I. Jordan (UC Berkeley)
DTSTART;VALUE=DATE-TIME:20200611T190000Z
DTEND;VALUE=DATE-TIME:20200611T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/5
DESCRIPTION:Title: On
Langevin Dynamics in Machine Learning\nby Michael I. Jordan (UC Berke
ley) as part of IAS Seminar Series on Theoretical Machine Learning\n\n\nAb
stract\nLangevin diffusions are continuous-time stochastic processes that
are based on the gradient of a potential function. As such they have many
connections---some known and many still to be explored---to gradient-based
machine learning. I'll discuss several recent results in this vein: (1) t
he use of Langevin-based algorithms in bandit problems\; (2) the accelerat
ion of Langevin diffusions\; (3) how to use Langevin Monte Carlo without m
aking smoothness assumptions. I'll present these results in the context of
a general argument about the virtues of continuous-time perspectives in t
he analysis of discrete-time optimization and Monte Carlo algorithms.\n
LOCATION:https://researchseminars.org/talk/IASML/5/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Avrim Blum (Toyota Technological Institute at Chicago)
DTSTART;VALUE=DATE-TIME:20200616T190000Z
DTEND;VALUE=DATE-TIME:20200616T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/6
DESCRIPTION:Title: On
learning in the presence of biased data and strategic behavior\nby Av
rim Blum (Toyota Technological Institute at Chicago) as part of IAS Semina
r Series on Theoretical Machine Learning\n\n\nAbstract\nIn this talk I wil
l discuss two lines of work involving learning in the presence of biased d
ata and strategic behavior. In the first\, we ask whether fairness constr
aints on learning algorithms can actually improve the accuracy of the clas
sifier produced\, when training data is unrepresentative or corrupted due
to bias. Typically\, fairness constraints are analyzed as a tradeoff with
classical objectives such as accuracy. Our results here show there are n
atural scenarios where they can be a win-win\, helping to improve overall
accuracy. In the second line of work we consider strategic classification
: settings where the entities being measured and classified wish to be cla
ssified as positive (e.g.\, college admissions) and will try to modify the
ir observable features if possible to make that happen. We consider this
in the online setting where a particular challenge is that updates made by
the learning algorithm will change how the inputs behave as well.\n
LOCATION:https://researchseminars.org/talk/IASML/6/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Csaba Szepesvári (University of Alberta)
DTSTART;VALUE=DATE-TIME:20200618T190000Z
DTEND;VALUE=DATE-TIME:20200618T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/7
DESCRIPTION:Title: Th
e challenges of model-based reinforcement learning and how to overcome the
m\nby Csaba Szepesvári (University of Alberta) as part of IAS Seminar
Series on Theoretical Machine Learning\n\n\nAbstract\nSome believe that t
ruly effective and efficient reinforcement learning algorithms must explic
itly construct and explicitly reason with models that capture the causal s
tructure of the world. In short\, model-based reinforcement learning is no
t optional. As this is not a new belief\, it may be surprising that empiri
cally\, at least as far as the current state of art is concerned\, the maj
ority of the top performing algorithms are model-free. In this talk\, I wi
ll define three major challenges that need to be overcome for model-based
methods to take their place above\, or before the model-free ones: (1) pla
nning with large models\; (2) models are never well-specified\; (3) models
need to focus on task relevant aspects and ignore others. For each of the
challenges\, I will describe recent results that address them and I will
also take a tally of the most interesting (and challenging) remaining open
problems.\n
LOCATION:https://researchseminars.org/talk/IASML/7/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Sanjeev Arora (Princeton University and IAS)
DTSTART;VALUE=DATE-TIME:20200625T190000Z
DTEND;VALUE=DATE-TIME:20200625T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/8
DESCRIPTION:Title: In
stance-Hiding Schemes for Private Distributed Learning\nby Sanjeev Aro
ra (Princeton University and IAS) as part of IAS Seminar Series on Theoret
ical Machine Learning\n\n\nAbstract\nAn important problem today is how to
allow multiple distributed entities to train a shared neural network on th
eir private data while protecting data privacy. Federated learning is a st
andard framework for distributed deep learning Federated Learning\, and on
e would like to assure full privacy in that framework . The proposed metho
ds\, such as homomorphic encryption and differential privacy\, come with d
rawbacks such as large computational overhead or large drop in accuracy. T
his work introduces a new and simple encryption of training data\, which h
ides the information in it and allows its use in the usual deep learning p
ipeline. The encryption is inspired by classic notion of instance-hiding i
n cryptography. Experiments show that it allows training with fairly small
effect on final accuracy.\n\nWe also give some theoretical analysis of pr
ivacy guarantees for this encryption\, showing that violating privacy requ
ires attackers to solve a difficult computational problem.\n\nJoint work w
ith Yangsibo Huang\, Zhao Song\, and Kai Li. To appear at ICML 2020.\n
LOCATION:https://researchseminars.org/talk/IASML/8/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Jennifer Listgarten (UC Berkeley)
DTSTART;VALUE=DATE-TIME:20200707T163000Z
DTEND;VALUE=DATE-TIME:20200707T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/9
DESCRIPTION:Title: Ma
chine learning-based design (of proteins\, small molecules and beyond)
\nby Jennifer Listgarten (UC Berkeley) as part of IAS Seminar Series on Th
eoretical Machine Learning\n\n\nAbstract\nData-driven design is making hea
dway into a number of application areas\, including protein\, small-molecu
le\, and materials engineering. The design goal is to construct an object
with desired properties\, such as a protein that binds to a target more ti
ghtly than previously observed. To that end\, costly experimental measurem
ents are being replaced with calls to a high-capacity regression model tra
ined on labeled data\, which can be leveraged in an in silico search for p
romising design candidates. The aim then is to discover designs that are b
etter than the best design in the observed data. This goal puts machine-le
arning based design in a much more difficult spot than traditional applica
tions of predictive modelling\, since successful design requires\, by defi
nition\, some degree of extrapolation---a pushing of the predictive models
to its unknown limits\, in parts of the design space that are a priori un
known. In this talk\, I will anchor this overall problem in protein engine
ering\, and discuss our emerging approaches to tackle it.\n
LOCATION:https://researchseminars.org/talk/IASML/9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Anima Anandkumar (Caltech)
DTSTART;VALUE=DATE-TIME:20200709T190000Z
DTEND;VALUE=DATE-TIME:20200709T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/10
DESCRIPTION:Title: R
ole of Interaction in Competitive Optimization\nby Anima Anandkumar (C
altech) as part of IAS Seminar Series on Theoretical Machine Learning\n\n\
nAbstract\nCompetitive optimization is needed for many ML problems such as
training GANs\, robust reinforcement learning\, and adversarial learning.
Standard approaches to competitive optimization involve each agent indepe
ndently optimizing their objective functions using SGD or other gradient-b
ased approaches. However\, they suffer from oscillations and instability\,
since the optimization does not account for interaction among the players
. We introduce competitive gradient descent (CGD) that explicitly incorpor
ates interaction by solving for Nash equilibrium of a local game. We exten
d CGD to competitive mirror descent (CMD) for solving conically constraine
d competitive problems by using the dual geometry induced by a Bregman div
ergence.\n\nWe demonstrate the effectiveness of our approach for training
GANs and solving constrained reinforcement learning (RL) problems. We also
derive a competitive policy optimization method to train RL agents in com
petitive games. Finally\, we provide a novel perspective on training GANs
by pointing out the "GAN-dilemma" a fundamental flaw of the divergence-min
imization perspective on GANs. Instead\, we argue that an implicit competi
tive regularization due to simultaneous training methods\, such as CGD\, i
s a crucial mechanism behind GAN performance.\n
LOCATION:https://researchseminars.org/talk/IASML/10/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Max Welling (University of Amsterdam)
DTSTART;VALUE=DATE-TIME:20200721T163000Z
DTEND;VALUE=DATE-TIME:20200721T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/11
DESCRIPTION:Title: G
raph Nets: The Next Generation\nby Max Welling (University of Amsterda
m) as part of IAS Seminar Series on Theoretical Machine Learning\n\n\nAbst
ract\nIn this talk I will introduce our next generation of graph neural ne
tworks. GNNs have the property that they are invariant to permutations of
the nodes in the graph and to rotations of the graph as a whole. We claim
this is unnecessarily restrictive and in this talk we will explore extensi
ons of these GNNs to more flexible equivariant constructions. In particula
r\, Natural Graph Networks for general graphs are globally equivariant und
er permutations of the nodes but can still be executed through local messa
ge passing protocols. Our mesh-CNNs on manifolds are equivariant under SO(
2) gauge transformations and as such\, unlike regular GNNs\, entertain non
-isotropic kernels. And finally our SE(3)-transformers are local message p
assing GNNs\, invariant to permutations but equivariant to global SE(3) tr
ansformations. These developments clearly emphasize the importance of geom
etry and symmetries as design principles for graph (or other) neural netwo
rks.\n\nJoint with: Pim de Haan and Taco Cohen (Natural Graph Networks) Pi
m de Haan\, Maurice Weiler and Taco Cohen (Mesh-CNNs) Fabian Fuchs and Dan
iel Worrall (SE(3)-Transformers)\n
LOCATION:https://researchseminars.org/talk/IASML/11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Yoshua Bengio (Université de Montréal)
DTSTART;VALUE=DATE-TIME:20200723T190000Z
DTEND;VALUE=DATE-TIME:20200723T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/12
DESCRIPTION:Title: P
riors for Semantic Variables\nby Yoshua Bengio (Université de Montré
al) as part of IAS Seminar Series on Theoretical Machine Learning\n\n\nAbs
tract\nSome of the aspects of the world around us are captured in natural
language and refer to semantic high-level variables\, which often have a c
ausal role (referring to agents\, objects\, and actions or intentions). Th
ese high-level variables also seem to satisfy very peculiar characteristic
s which low-level data (like images or sounds) do not share\, and it would
be good to clarify these characteristics in the form of priors which can
guide the design of machine learning systems benefitting from these assump
tions. Since these priors are not just about the joint distribution betwee
n the semantic variables (e.g. it has a sparse factor graph corresponding
to a modular decomposition of knowledge) but also about how the distributi
on changes (typically by causal interventions)\, this analysis may also he
lp to build machine learning systems which can generalize better out-of-di
stribution. Introducing such assumptions is necessary to even start having
a theory about generalizing out-of-distribution. There are also fascinati
ng connections between these priors and what is hypothesized about conscio
us processing in the brain\, with conscious processing allowing us to reas
on (i.e.\, perform chains of inferences about the past and the future\, as
well as credit assignment) at the level of these high-level variables. Th
is involves attention mechanisms and short-term memory to form a bottlenec
k of information being broadcast around the brain between different parts
of it\, as we focus on different high-level variables and some of their in
teractions. The presentation summarizes a few recent results using some of
these ideas for discovering causal structure and modularizing recurrent n
eural networks with attention mechanisms in order to obtain better out-of-
distribution generalization and move deep learning towards capturing some
of the functions associated with conscious processing over high-level sema
ntic variables.\n
LOCATION:https://researchseminars.org/talk/IASML/12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Jeffrey Negrea (University of Toronto)
DTSTART;VALUE=DATE-TIME:20200714T163000Z
DTEND;VALUE=DATE-TIME:20200714T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/13
DESCRIPTION:Title: R
elaxing the I.I.D. assumption: Adaptive mnimax optimal sequential predicti
on with expert advice\nby Jeffrey Negrea (University of Toronto) as pa
rt of IAS Seminar Series on Theoretical Machine Learning\n\n\nAbstract\nWe
consider sequential prediction with expert advice when the data are gener
ated stochastically\, but the distributions generating the data may vary a
rbitrarily among some constraint set. We quantify relaxations of the class
ical I.I.D. assumption in terms of possible constraint sets\, with I.I.D.
at one extreme\, and an adversarial mechanism at the other. The Hedge algo
rithm\, long known to be minimax optimal for in the adversarial regime\, h
as recently been shown to also be minimax optimal in the I.I.D. setting. W
e show that Hedge is sub-optimal between these extremes\, and present a ne
w algorithm that is adaptively minimax optimal with respect to our relaxat
ions of the I.I.D. assumption\, without knowledge of which setting prevail
s.\n
LOCATION:https://researchseminars.org/talk/IASML/13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Arthur Gretton (University College London)
DTSTART;VALUE=DATE-TIME:20200728T163000Z
DTEND;VALUE=DATE-TIME:20200728T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/14
DESCRIPTION:Title: G
eneralized Energy-Based Models\nby Arthur Gretton (University College
London) as part of IAS Seminar Series on Theoretical Machine Learning\n\n\
nAbstract\nI will introduce Generalized Energy Based Models (GEBM) for gen
erative modelling. These models combine two trained components: a base dis
tribution (generally an implicit model)\, which can learn the support of d
ata with low intrinsic dimension in a high dimensional space\; and an ener
gy function\, to refine the probability mass on the learned support. Both
the energy function and base jointly constitute the final model\, unlike G
ANs\, which retain only the base distribution (the "generator"). In partic
ular\, while the energy function is analogous to the GAN critic function\,
it is not discarded after training.\nGEBMs are trained by alternating bet
ween learning the energy and the base. Both training stages are well-defin
ed: the energy is learned by maximising a generalized likelihood\, and the
resulting energy-based loss provides informative gradients for learning t
he base. Samples from the posterior on the latent space of the trained mod
el can be obtained via MCMC\, thus finding regions in this space that prod
uce better quality samples. Empirically\, the GEBM samples on image-genera
tion tasks are of much better quality than those from the learned generato
r alone\, indicating that all else being equal\, the GEBM will outperform
a GAN of the same complexity. GEBMs also return state-of-the-art performan
ce on density modelling tasks\, and when using base measures with an expli
cit form.\n
LOCATION:https://researchseminars.org/talk/IASML/14/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Peter Stone (University of Texas at Austin)
DTSTART;VALUE=DATE-TIME:20200730T190000Z
DTEND;VALUE=DATE-TIME:20200730T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/15
DESCRIPTION:Title: E
fficient Robot Skill Learning via Grounded Simulation Learning\, Imitation
Learning from Observation\, and Off-Policy Reinforcement Learning\nby
Peter Stone (University of Texas at Austin) as part of IAS Seminar Series
on Theoretical Machine Learning\n\n\nAbstract\nFor autonomous robots to o
perate in the open\, dynamically changing world\, they will need to be abl
e to learn a robust set of skills from relatively little experience. This
talk begins by introducing Grounded Simulation Learning as a way to bridge
the so-called reality gap between simulators and the real world in order
to enable transfer learning from simulation to a real robot. It then intro
duces two new algorithms for imitation learning from observation that enab
le a robot to mimic demonstrated skills from state-only trajectories\, wit
hout any knowledge of the actions selected by the demonstrator. Connection
s to theoretical advances in off-policy reinforcement learning will be hig
hlighted throughout.\n\nGrounded Simulation Learning has led to the fastes
t known stable walk on a widely used humanoid robot\, and imitation learni
ng from observation opens the possibility of robots learning from the vast
trove of videos available online.\n
LOCATION:https://researchseminars.org/talk/IASML/15/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Aapo Hyvärinen (University of Helsinki)
DTSTART;VALUE=DATE-TIME:20200804T163000Z
DTEND;VALUE=DATE-TIME:20200804T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/16
DESCRIPTION:Title: N
onlinear independent component analysis\nby Aapo Hyvärinen (Universit
y of Helsinki) as part of IAS Seminar Series on Theoretical Machine Learni
ng\n\n\nAbstract\nUnsupervised learning\, in particular learning general n
onlinear representations\, is one of the deepest problems in machine learn
ing. Estimating latent quantities in a generative model provides a princip
led framework\, and has been successfully used in the linear case\, e.g. w
ith independent component analysis (ICA) and sparse coding. However\, exte
nding ICA to the nonlinear case has proven to be extremely difficult: A st
raight-forward extension is unidentifiable\, i.e. it is not possible to re
cover those latent components that actually generated the data. Here\, we
show that this problem can be solved by using additional information eithe
r in the form of temporal structure or an additional observed variable. We
start by formulating two generative models in which the data is an arbitr
ary but invertible nonlinear transformation of time series (components) wh
ich are statistically independent of each other. Drawing from the theory o
f linear ICA\, we formulate two distinct classes of temporal structure of
the components which enable identification\, i.e. recovery of the original
independent components. We further generalize the framework to the case w
here instead of temporal structure\, an additional "auxiliary" variable is
observed and used by means of conditioning (e.g. audio in addition to vid
eo). Our methods are closely related to "self-supervised" methods heuristi
cally proposed in computer vision\, and also provide a theoretical foundat
ion for such methods in terms of estimating a latent-variable model. Likew
ise\, we show how variants of deep latent-variable models such as VAE's ca
n be seen as nonlinear ICA\, and made identifiable by suitable conditionin
g.\n
LOCATION:https://researchseminars.org/talk/IASML/16/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Eric Xing (Carnegie Mellon University)
DTSTART;VALUE=DATE-TIME:20200806T190000Z
DTEND;VALUE=DATE-TIME:20200806T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/17
DESCRIPTION:Title: A
Blueprint of Standardized and Composable Machine Learning\nby Eric Xi
ng (Carnegie Mellon University) as part of IAS Seminar Series on Theoretic
al Machine Learning\n\n\nAbstract\nIn handling wide range of experiences r
anging from data instances\, knowledge\, constraints\, to rewards\, advers
aries\, and lifelong interplay in an ever-growing spectrum of tasks\, cont
emporary ML/AI research has resulted in thousands of models\, learning par
adigms\, optimization algorithms\, not mentioning countless approximation
heuristics\, tuning tricks\, and black-box oracles\, plus combinations of
all above. While pushing the field forward rapidly\, these results also ma
ke a comprehensive grasp of existing ML techniques more and more difficult
\, and make standardized\, reusable\, repeatable\, reliable\, and explaina
ble practice and further development of ML/AI products quite costly\, if p
ossible\, at all. In this talk\, we present a simple and systematic bluepr
int of ML\, from the aspects of losses\, optimization solvers\, and model
architectures\, that provides a unified mathematical formulation for learn
ing with all experiences and tasks. The blueprint offers a holistic unders
tanding of the diverse ML algorithms\, guidance of operationalizing ML for
creating problem solutions in a composable and mechanic manner\, and unif
ied framework for theoretical analysis.\n
LOCATION:https://researchseminars.org/talk/IASML/17/
END:VEVENT
BEGIN:VEVENT
SUMMARY:John Shawe-Taylor (University College London)
DTSTART;VALUE=DATE-TIME:20200811T163000Z
DTEND;VALUE=DATE-TIME:20200811T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/18
DESCRIPTION:Title: S
tatistical Learning Theory for Modern Machine Learning\nby John Shawe-
Taylor (University College London) as part of IAS Seminar Series on Theore
tical Machine Learning\n\n\nAbstract\nProbably Approximately Correct (PAC)
learning has attempted to analyse the generalisation of learning systems
within the statistical learning framework. It has been referred to as a
‘worst case’ analysis\, but the tools have been extended to analyse ca
ses where benign distributions mean we can still generalise even if worst
case bounds suggest we cannot. The talk will cover the PAC-Bayes approach
to analysing generalisation that is inspired by Bayesian inference\, but l
eads to a different role for the prior and posterior distributions. We wil
l discuss its application to Support Vector Machines and Deep Neural Netwo
rks\, including the use of distribution defined priors.\n
LOCATION:https://researchseminars.org/talk/IASML/18/
END:VEVENT
BEGIN:VEVENT
SUMMARY:John Langford (Microsoft Research)
DTSTART;VALUE=DATE-TIME:20200813T190000Z
DTEND;VALUE=DATE-TIME:20200813T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/19
DESCRIPTION:Title: L
atent State Discovery in Reinforcement Learning\nby John Langford (Mic
rosoft Research) as part of IAS Seminar Series on Theoretical Machine Lear
ning\n\n\nAbstract\nThere are three core orthogonal problems in reinforcem
ent learning: (1) Crediting actions (2) generalizing across rich observati
ons (3) Exploring to discover the information necessary for learning. Goo
d solutions to pairs of these problems are fairly well known at this point
\, but solutions for all three are just now being discovered. I’ll dis
cuss several such results and dive into details on a few of them.\n
LOCATION:https://researchseminars.org/talk/IASML/19/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Li Deng (Citadel)
DTSTART;VALUE=DATE-TIME:20200818T163000Z
DTEND;VALUE=DATE-TIME:20200818T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/20
DESCRIPTION:Title: F
rom Speech AI to Finance AI and Back\nby Li Deng (Citadel) as part of
IAS Seminar Series on Theoretical Machine Learning\n\n\nAbstract\nA brief
review will be provided first on how deep learning has disrupted speech re
cognition and language processing industries since 2009. Then connections
will be drawn between the techniques (deep learning or otherwise) for mode
ling speech and language and those for financial markets. Similarities and
differences of these two fields will be explored. In particular\, three u
nique technical challenges to financial investment are addressed: extremel
y low signal-to-noise ratio\, extremely strong nonstationarity (with adver
sarial nature)\, and heterogeneous big data. Finally\, how the potential s
olutions to these challenges can come back to benefit and further advance
speech recognition and language processing technology will be discussed.\n
LOCATION:https://researchseminars.org/talk/IASML/20/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Jason Eisner (Johns Hopkins University)
DTSTART;VALUE=DATE-TIME:20200820T190000Z
DTEND;VALUE=DATE-TIME:20200820T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/21
DESCRIPTION:Title: E
vent Sequence Modeling with the Neural Hawkes Process\nby Jason Eisner
(Johns Hopkins University) as part of IAS Seminar Series on Theoretical M
achine Learning\n\n\nAbstract\nSuppose you are monitoring discrete events
in real time. Can you predict what events will happen in the future\, and
when? Can you fill in past events that you may have missed? A probabili
ty model that supports such reasoning is the neural Hawkes process (NHP)\,
in which the Poisson intensities of K event types at time t depend on the
history of past events. This autoregressive architecture can capture com
plex dependencies. It resembles an LSTM language model over K word types\
, but allows the LSTM state to evolve in continuous time. \n\nThis talk w
ill present the NHP model along with methods for estimating parameters (ML
E and NCE)\, sampling predictions of the future (thinning)\, and imputing
missing events (particle smoothing). I'll then show how to scale the NHP
or the LSTM language model to large K\, beginning with a temporal deductiv
e database for a real-world domain\, which can track how possible event ty
pes and other facts change over time. We take the system state to be a co
llection of vector-space embeddings of these facts\, and derive a deep rec
urrent architecture from the temporal Datalog program that specifies the d
atabase. We call this method "neural Datalog through time."\n\nThis work
was done with Hongyuan Mei and other collaborators including Guanghui Qin\
, Minjie Xu\, and Tom Wan.\n
LOCATION:https://researchseminars.org/talk/IASML/21/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Piotr Indyk (Massachusetts Institute of Technology)
DTSTART;VALUE=DATE-TIME:20200825T163000Z
DTEND;VALUE=DATE-TIME:20200825T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/22
DESCRIPTION:Title: L
earning-Based Sketching Algorithms\nby Piotr Indyk (Massachusetts Inst
itute of Technology) as part of IAS Seminar Series on Theoretical Machine
Learning\n\n\nAbstract\nClassical algorithms typically provide "one size f
its all" performance\, and do not leverage properties or patterns in their
inputs. A recent line of work aims to address this issue by developing al
gorithms that use machine learning predictions to improve their performanc
e. In this talk I will present two examples of this type\, in the context
of streaming and sketching algorithms. In particular\, I will show how to
use machine learning predictions to improve the performance of (a) low-mem
ory streaming algorithms for frequency estimation\, and (b) generating spa
ce partitions for nearest neighbor search.\n\nThe talk will cover material
from papers co-authored with Y Dong\, CY Hsu\, D Katabi\, I Razenshteyn\,
T Wagner and A Vakilian.\n
LOCATION:https://researchseminars.org/talk/IASML/22/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Inderjit Dhillon (University of Texas at Austin)
DTSTART;VALUE=DATE-TIME:20200827T190000Z
DTEND;VALUE=DATE-TIME:20200827T203000Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/23
DESCRIPTION:Title: M
ulti-Output Prediction: Theory and Practice\nby Inderjit Dhillon (Univ
ersity of Texas at Austin) as part of IAS Seminar Series on Theoretical Ma
chine Learning\n\n\nAbstract\nMany challenging problems in modern applicat
ions amount to finding relevant results from an enormous output space of p
otential candidates\, for example\, finding the best matching product from
a large catalog or suggesting related search phrases on a search engine.
The size of the output space for these problems can be in the millions to
billions. Moreover\, observational or training data is often limited for m
any of the so-called “long-tail” of items in the output space. Given t
he inherent paucity of training data for most of the items in the output s
pace\, developing machine learned models that perform well for spaces of t
his size is challenging. Fortunately\, items in the output space are often
correlated thereby presenting an opportunity to alleviate the data sparsi
ty issue. In this talk\, I will first discuss the challenges in modern mul
ti-output prediction\, including missing values\, features associated with
outputs\, absence of explicit negative examples\, and the need to scale u
p to enormous data sets. Bilinear methods\, such as Inductive Matrix Compl
etion (IMC)\, enable us to handle missing values and output features in pr
actice\, while coming with theoretical guarantees. Nonlinear methods such
as nonlinear IMC and DSSM (Deep Semantic Similarity Model) enable more pow
erful models that are used in practice in real-life applications. However\
, inference in these models scales linearly with the size of the output sp
ace. In order to scale up\, I will present the Prediction for Enormous and
Correlated Output Spaces (PECOS) framework\, that performs prediction in
three phases: (i) in the first phase\, the output space is organized using
a semantic indexing scheme\, (ii) in the second phase\, the indexing is u
sed to narrow down the output space by orders of magnitude using a machine
learned matching scheme\, and (iii) in the third phase\, the matched item
s are ranked by a final ranking scheme. The versatility and modularity of
PECOS allows for easy plug-and-play of various choices for the indexing\,
matching\, and ranking phases\, and it is possible to ensemble various mod
els\, each arising from a particular choice for the three phases.\n
LOCATION:https://researchseminars.org/talk/IASML/23/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Soheil Feizi (University of Maryland College Park)
DTSTART;VALUE=DATE-TIME:20200623T163000Z
DTEND;VALUE=DATE-TIME:20200623T174500Z
DTSTAMP;VALUE=DATE-TIME:20241016T081104Z
UID:IASML/24
DESCRIPTION:Title: G
eneralizable Adversarial Robustness to Unforeseen Attacks\nby Soheil F
eizi (University of Maryland College Park) as part of IAS Seminar Series o
n Theoretical Machine Learning\n\n\nAbstract\nIn the last couple of years\
, a lot of progress has been made to enhance robustness of models against
adversarial attacks. However\, two major shortcomings still remain: (i) pr
actical defenses are often vulnerable against strong “adaptive” attack
algorithms\, and (ii) current defenses have poor generalization to “unf
oreseen” attack threat models (the ones not used in training).\n\nIn thi
s talk\, I will present our recent results to tackle these issues. I will
first discuss generalizability of a class of provable defenses based on ra
ndomized smoothing to various Lp and non-Lp attack models. Then\, I will p
resent adversarial attacks and defenses for a novel “perceptual” adver
sarial threat model. Remarkably\, the defense against perceptual threat mo
del generalizes well against many types of unforeseen Lp and non-Lp advers
arial attacks.\n\nThis talk is based on joint works with Alex Levine\, Sah
il Singla\, Cassidy Laidlaw\, Aounon Kumar and Tom Goldstein.\n
LOCATION:https://researchseminars.org/talk/IASML/24/
END:VEVENT
END:VCALENDAR