The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these question...
Research Assistant
AI chat, annotations, notes & similar papers
No comments yet
Be the first to share your thoughts!