I am building Huggingface Longformer based classifier. My main code below
model = LongformerForSequenceClassification.from_pretrained('/mnt/longformer_official/', gradient_checkpointing=False, attention_window = 512) tokenizer = LongformerTokenizerFast.from_pretrained('/mnt/longformer_official/', max_length = 4000) train_df_tuning_dataset_tokenized = train_df_tuning_dataset.map(tokenization, batched = True, batch_size = len(train_df_tuning_dataset)) training_args = TrainingArguments( output_dir="xyz", num_train_epochs = 5,# changed this from 5 per_device_train_batch_size = 4,#4,#8,#adding on 18 march from huggingface example notebook gradient_accumulation_steps = 16,#16, #8 adding it back 18 march even though missing in huggingface example notebook as otherwise memory issues per_device_eval_batch_size= 16,#16 evaluation_strategy = "epoch", save_strategy = "epoch",#adding on 18 march from huggingface example notebook learning_rate=2e-5,#adding on 18 march from huggingface example notebook load_best_model_at_end=True, greater_is_better=False, disable_tqdm = False, weight_decay=0.01, optim="adamw_torch",#removing on 18 march from huggingface example notebook run_name = 'longformer-classification-16March2022' ) #class weights class CustomTrainer(Trainer): def compute_loss(self, model, inputs, return_outputs=False): labels = inputs.get("labels") # forward pass outputs = model(**inputs) logits = outputs.get("logits") # compute custom loss (suppose one has 3 labels with different weights) loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 0.5243])).to(device) loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1)).to(device) return (loss, outputs) if return_outputs else loss trainer = CustomTrainer( model=model, args=training_args, compute_metrics=compute_metrics, train_dataset=train_df_tuning_dataset_tokenized, eval_dataset=val_dataset_tokenized )
When I try
max_length=1500 in the
tokenizer, the code runs fine. It fails when run with
I even tried setting these parameters as
per_device_train_batch_size = 1, gradient_accumulation_steps = 1, per_device_eval_batch_size = 1
is it okay to set
per_device_train_batch_size = 1, gradient_accumulation_steps = 1, per_device_eval_batch_size = 1?
The error that I get is as below. Is there any way around this other than getting more memory?
RuntimeError: CUDA out of memory. Tried to allocate 720.00 MiB (GPU 0; 14.76 GiB total capacity; 12.77 GiB already allocated; 111.75 MiB free; 13.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF