Deep Q-Learning : Successful Training but Fails in Testing

Hello everyone,

I’ve been writing code of deep Q-learning model with experience replay and fixed Q-values.

The environment I’m using is a 4x4 grid, where the agent starts at the top-left corner and the objective is to reach the bottom-right corner.

The training phase appears to be successful, with the loss approaching 0. However, during the testing phase, the agent struggles to reach the target.(image and python code attached below)

I would be deeply grateful if someone could offer guidance or suggest solutions to address this issue

Thank you so much in advance for your help and guidance.

Screenshot 2023-08-21 144549

github link