It seems converting the model type will change the training loss.
For instance, the training loss of a) and b) is inconsistent:
a):
train(model)
model.half()
eval(model)
model.float()
b):
train(model)
eval(model)
I have to use deepcopy to solve the issue:
train(model)
model_copy = deepcopy.copy(model).half()
eval_model(model_copy)
Is there better way to evaluate the model in fp16 during training without hard-copy the model?