Deep Q-Learning : Successful Training but Fails in Testing

I’ve been writing code of deep Q-learning model with experience replay and fixed Q-values.

The environment I’m using is a 4x4 grid, where the agent starts at the top-left corner and the objective is to reach the bottom-right corner.

The training phase appears to be successful, with the loss approaching 0. However, during the testing phase, the agent struggles to reach the target.(image and python code attached below)

