Great differences beetween traininng and evaluation stats

Hello, I am having some problems with a CNN with U-net architecture. I am currently trying to train a model that performs topology optimization, as context the model I am using is present in this article [1901.07761] A deep Convolutional Neural Network for topology optimization with strong generalization ability, in the article it said that the model achieves more than 0.9 accuracy, but I am barely achieving 0.72, I assume that is the data set used for training since I am using the same architecture. The strange thing comes later. I train the model and in training I get 0.72 and a loss of about 0.15, but when I load the weights and do an evaluation on the same validation data set it gave me 3.0 loss and low precision, it give me about 3.0 loss and low precision, why is that?

Thank you