LongFormer - fp16 training without Trainer

morenolq · April 5, 2022, 6:52pm

I’m trying to fine-tune a LongFormer model (allenai/longformer-base-4096 · Hugging Face) on a single GPU (RTX 3090). However, having lots of data will result in a very long training time.

I’m asking for a way to train the model with FP16 precision (reducing overall load) but I’m not able to do it without the standard Trainer class. Is it possible? How I can do that with a standard training loop?

        
for epoch in range(self.epochs):  
    self.model.train()
    total_loss, total_val_loss = 0, 0
    for step, batch in enumerate(self.train_dataloader):
        self.model.zero_grad() 
        outputs = self.model(batch[0].to(self.device), 
                            attention_mask = batch[1].to(self.device), 
                            token_type_ids = batch[2].to(self.device), 
                            labels = batch[3].to(self.device))

        outputs.loss.backward()
        torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()

morenolq · April 27, 2022, 10:15am

I think it is too hidden in the documentation:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
  "longformer-base-4096", 
  torch_dtype=torch.float16)

torch_dtype=torch.float16 does the trick

Topic		Replies	Views
Using multi GPU with Trainer through Deepspeed, parameters found on cpu Beginners	0	1045	August 9, 2023
Training Longformer works on jupyter notebook but not with .py 🤗Transformers	0	89	May 17, 2024
Fine-tuned longformer classifies all test samples as False Beginners	0	351	May 19, 2022
Longformer on 1 GPU or multi-GPU Beginners	6	1708	June 3, 2022
How can I enforce reproducibility for Longformer? Beginners	3	1242	March 30, 2023

LongFormer - fp16 training without Trainer

Related topics