LongFormer - fp16 training without Trainer

I’m trying to fine-tune a LongFormer model (allenai/longformer-base-4096 · Hugging Face) on a single GPU (RTX 3090). However, having lots of data will result in a very long training time.

I’m asking for a way to train the model with FP16 precision (reducing overall load) but I’m not able to do it without the standard Trainer class. Is it possible? How I can do that with a standard training loop?

        
for epoch in range(self.epochs):  
    self.model.train()
    total_loss, total_val_loss = 0, 0
    for step, batch in enumerate(self.train_dataloader):
        self.model.zero_grad() 
        outputs = self.model(batch[0].to(self.device), 
                            attention_mask = batch[1].to(self.device), 
                            token_type_ids = batch[2].to(self.device), 
                            labels = batch[3].to(self.device))

        outputs.loss.backward()
        torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()

I think it is too hidden in the documentation:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
  "longformer-base-4096", 
  torch_dtype=torch.float16)

torch_dtype=torch.float16 does the trick