Question on convergence of DQN and its variants

Hi there,

I am an EE major formally trained in DSP and has been working in the aerospace industries for years. A few year ago, I had started expanding my horizon into Deep Learning (DL) and machine learning but with limited experience. I started looking into reinforcement learning and specifically DQN and its variants a few weeks ago. And, I am surprise to find out that DQN and its variants even for a simple environment like CartPole-v1, there is no guarantee of convergence. In another word, when looking at the plot of Total Reward vs Episode, it is really ugly. Am I missing something here?