Leverage the Average: An Analysis of Regularization in RL

Made for a reading group at the Center for Safe AGI.

Leverage the Average: An Analysis of Regularization in RL

Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist


Building upon the formalism of regularized Markov decision processes, we study the effect of Kullback-Leibler (KL) and entropy regularization in reinforcement learning. Through an equivalent formulation of the related approximate dynamic programming (ADP) scheme, we show that a KL penalty amounts to averaging q-values. This equivalence allows drawing connections between a priori disconnected methods from the literature, and proving that a KL regularization indeed leads to averaging errors made at each iteration of value function update. With the proposed theoretical analysis, we also study the interplay between KL and entropy regularization. When the considered ADP scheme is combined with neural-network-based stochastic approximations, the equivalence is lost, which suggests a number of different ways to do regularization. Because this goes beyond what we can analyse theoretically, we extensively study this aspect empirically.

Speaker bio:

Matthieu Geist obtained an Electrical Engineering degree and an MSc degree in Applied Mathematics in Sept. 2006 (Supélec, France), a PhD degree in Applied Mathematics in Nov. 2009 (University Paul Verlaine of Metz, France) and a Habilitation degree in Feb. 2016 (University Lille 1, France). Between Feb. 2010 and Sept. 2017, he was an assistant professor at CentraleSupélec, France. In Sept. 2017, he joined University of Lorraine, France, as a full professor in Applied Mathematics (Interdisciplinary Laboratory for Continental Environments, CNRS-UL). Since Sept. 2018, he is on secondment at Google Brain, as a research scientist (Paris, France). His research interests include machine learning, especially reinforcement learning and imitation learning.

Paper link