Bridging the Gap Between Value and Policy Based Reinforcement Learning

Nachum, Ofir; Norouzi, Mohammad; Xu, Kelvin; Schuurmans, Dale

doi:10.48550/arxiv.1702.08892

Public

Bridging the Gap Between Value and Policy Based Reinforcement Learning

Shared by NobleBlocks on Feb 28, 2017 • 12:00 AM UTC

Authors:

Ofir Nachum

Mohammad Norouzi

Kelvin Xu

Abstract

We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that softmax consistent action values correspond to optimal entropy re...

Subject

Softmax function

Reinforcement learning

Computer science

Research Assistant

AI chat, annotations, notes & similar papers

Finding related papers...

Discussions

(0)

No comments yet

Be the first to share your thoughts!