BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Csaba Szepesvári (University of Alberta and DeepMind)
DTSTART:20200625T163000Z
DTEND:20200625T173000Z
DTSTAMP:20260423T003258Z
UID:MPML/5
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/MPML/5/">Con
 fident Off-Policy Evaluation and Selection through Self-Normalized Importa
 nce Weighting</a>\nby Csaba Szepesvári (University of Alberta and DeepMin
 d) as part of Mathematics\, Physics and Machine Learning (IST\, Lisbon)\n\
 n\nAbstract\nOff-policy evaluation is the problem of predicting the value 
 of a policy given some batch of data. In the language of statistics\, this
  is also called counterfactual estimation. Batch policy optimization refer
 s to the problem of finding a good policy\, again\, given some logged data
 .\nIn this talk\, I will consider the case of contextual bandits\, give a 
 brief (and incomplete) review of the approaches proposed in the literature
  and explain why this problem is difficult. Then\, I will describe a new a
 pproach based on self-normalized importance weighting. In this approach\, 
 a semi-empirical Efron-Stein concentration inequality is combined with Har
 ris' inequality to arrive at non-vacuous high-probability value lower boun
 ds\, which can then be used in a policy selection phase. On a number of sy
 nthetic and real datasets this new approach is found to be significantly s
 uperior than its main competitors\, both in terms of tightness of the conf
 idence intervals and the quality of the policies chosen. \n\nThe talk is b
 ased on joint work with Ilja Kuzborskij\, Claire Vernade and Andras Gyorgy
 .\n
LOCATION:https://researchseminars.org/talk/MPML/5/
END:VEVENT
END:VCALENDAR
