We establish a new connection between value and policy based reinforcement learning (RL) based on a relationship between softmax temporal value consistency and policy optimality under entropy regularization. Specifically, we show that softmax consistent action values correspond to optimal entropy re...
Research Assistant
AI chat, annotations, notes & similar papers
No comments yet
Be the first to share your thoughts!