Policy Optimization in Reinforcement Learning: A Tale of Preconditioning and Regularization

Yuejie Chi (Carnegie Mellon University)

25-Jun-2021, 13:00-14:00 (3 years ago)

Abstract: Policy optimization, which learns the policy of interest by maximizing the value function via large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL). In addition to value maximization, other practical considerations arise commonly as well, including the need of encouraging exploration, and that of ensuring certain structural properties of the learned policy due to safety, resource and operational constraints. These considerations can often be accounted for by resorting to regularized RL, which augments the target value function with a structure-promoting regularization term, such as Shannon entropy, Tsallis entropy, and log-barrier functions. Focusing on an infinite-horizon discounted Markov decision process, this talk first shows that entropy-regularized natural policy gradient methods converge globally at a linear convergence that is near independent of the dimension of the state-action space. Next, a generalized policy mirror descent algorithm is proposed to accommodate a general class of convex regularizers beyond Shannon entropy. Encouragingly, this general algorithm inherits similar convergence guarantees, even when the regularizer lacks strong convexity and smoothness. Our results accommodate a wide range of learning rates, and shed light upon the role of regularization in enabling fast convergence in RL.

data structures and algorithmsmachine learningmathematical physicsinformation theoryoptimization and controldata analysis, statistics and probability

Audience: researchers in the topic

( video )


Mathematics, Physics and Machine Learning (IST, Lisbon)

Series comments: To receive the series announcements, please register in:
mpml.tecnico.ulisboa.pt
mpml.tecnico.ulisboa.pt/registration
Zoom link: videoconf-colibri.zoom.us/j/91599759679

Organizers: Mário Figueiredo, Tiago Domingos, Francisco Melo, Jose Mourao*, Cláudia Nunes, Yasser Omar, Pedro Alexandre Santos, João Seixas, Cláudia Soares, João Xavier
*contact for this listing

Export talk to