TD3 reward not increasing over time
Hey for a uni project i have implemented td3 and trying to test it on pendulum v1 before using the assigned environment.
Here is the list of my hyperparameters:
"actor_lr": 0.0001,
"critic_lr": 0.0001,
"discount": 0.95,
"tau": 0.005,
"batch_size": 128,
"hidden_dim_critic": [256, 256],
"hidden_dim_actor": [256, 256],
"noise": "Gaussian",
"noise_clip": 0.3,
"noise_std": 0.2,
"policy_update_freq": 2,
"buffer_size": int(1e6),
The issue im facing is that the reward keeps decreasing over time, and saturates at around -1450 after some episodes. Does anyone have any ideas, where my issues could lie?
If needed i could also provide any code where you suspect a bug might be
Thanks in advance for your help!