What am I missing with my RL project

I’m training an agent to get good at a game I made. It operates a spacecraft in an environment where asteroids fall downward in a 2D space. After reaching the bottom, the asteroids respawn at the top in random positions with random speeds. (Too stochastic?)

Normal DQN and Double DQN weren’t working.

I switched to DuelingDQN and added a replay buffer.

Loss is finally decreasing as training continues but the learned policy still leads to highly variable performance with no actual improvement on average.

Is this something wrong with my reward structure?

Currently using +1 for every step survived plus a -50 penalty for an asteroid collision.

Any help you can give would be very much appreciated. I am new to this and have been struggling for days.

I’m training an agent to get good at a game I made. It operates a spacecraft in an environment where asteroids fall downward in a 2D space. After reaching the bottom, the asteroids respawn at the top in random positions with random speeds. (Too stochastic?)

Normal DQN and Double DQN weren’t working.

I switched to DuelingDQN and added a replay buffer.

Loss is finally decreasing as training continues but the learned policy still leads to highly variable performance with no actual improvement on average.

Is this something wrong with my reward structure?

Currently using +1 for every step survived plus a -50 penalty for an asteroid collision.

Any help you can give would be very much appreciated. I am new to this and have been struggling for days.