BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Dylan Foster (MIT)
DTSTART:20200925T150500Z
DTEND:20200925T160500Z
DTSTAMP:20260423T024542Z
UID:sss/6
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/sss/6/">Sepa
 rating Estimation from Decision Making in Contextual Bandits</a>\nby Dylan
  Foster (MIT) as part of Stochastics and Statistics Seminar Series\n\n\nAb
 stract\nThe contextual bandit is a sequential decision making problem in w
 hich a learner repeatedly selects an action (e.g.\, a news article to disp
 lay) in response to a context (e.g.\, a user’s profile) and receives a r
 eward\, but only for the action they selected. Beyond the classic explore-
 exploit tradeoff\, a fundamental challenge in contextual bandits is to dev
 elop algorithms that can leverage flexible function approximation to model
  similarity between contexts\, yet have computational requirements compara
 ble to classical supervised learning tasks such as classification and regr
 ession. To this end\, we provide the first universal and optimal reduction
  from contextual bandits to online regression. We show how to transform an
 y oracle for online regression with a given value function class into an a
 lgorithm for contextual bandits with the induced policy class\, with no ov
 erhead in runtime or memory requirements. Conceptually\, our results show 
 that it is possible to provably separate estimation and decision making in
 to separate algorithmic building blocks\, and that this can be effective b
 oth in theory and in practice. Time permitting\, I will discuss extensions
  of these techniques to more challenging reinforcement learning problems.\
 n
LOCATION:https://researchseminars.org/talk/sss/6/
END:VEVENT
END:VCALENDAR
