I’ve been writing code of deep Q-learning model with experience replay and fixed Q-values.
The environment I’m using is a 4x4 grid, where the agent starts at the top-left corner and the objective is to reach the bottom-right corner.
The training phase appears to be successful, with the loss approaching 0. However, during the testing phase, the agent struggles to reach the target.(image and python code attached below)
I would be deeply grateful if someone could offer guidance or suggest solutions to address this issue
Thank you so much in advance for your help and guidance.