Hi I am using pytorch and huggingface to train my roberta-base to RTE dataset.
But I find that just using torch.half
to my model will cause nan after first backward.
Is there any way to train my model with fp16 and without using huggingface’s Trainer
function?
Yes, you can do this without using the Huggingface trainer. You’ll have to directly use the PyTorch AMP feature though (and this is what the HF trainer does as well, if you specify “amp” for the half_precision_backend). You’d want to get started by reading this page.