Hi,
I’m trying to fine-tune my first NLI model with Transformers
on Colab. I’m trying to fine-tune ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli
on a dataset of around 276.000 hypothesis-premise pairs. I’m following the instructions from the docs here and here.
The issue is that I get a memory error, when I run the code below on colab. My colab GPU seems to have around 12 GB RAM. The error occurs at the end during the training step, but I see in colab that already after the encoding step, 7~GB RAM is occupied. Then RAM usage shoots up at training and colab crashes.
I’m new to fine-tuning models. It would be great if someone could give some advice on how to reduce the RAM footprint in the code below.
What I’ve tried:
- Use model.half() to reduce memory footprint
- I changed
per_device_train_batch_size
andper_device_eval_batch_size
from 32 to 8 to 2. (Not sure if a lower number here reduces the memory requirement? Or are higher numbers better for RAM?) - What else can/should be improved in the code below?
Thanks a lot for your help!
My code:
# ... some data preparation
### load model and tokenizer
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
max_length = 256
hg_model_hub_name = "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"
tokenizer = AutoTokenizer.from_pretrained(hg_model_hub_name)
model = AutoModelForSequenceClassification.from_pretrained(hg_model_hub_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")
if device == "cuda":
model = model.half() # for half-precision training. reduces RAM requirement; decreases speed if on older GPU # https://huggingface.co/transformers/v1.1.0/examples.html
model.to(device)
model.train();
# ... some data preparation ...
encodings_train = tokenizer(premise_train, hypothesis_train, return_tensors="pt", max_length=max_length,
return_token_type_ids=True, truncation=True, padding=True)
encodings_val = tokenizer(premise_val, hypothesis_val, return_tensors="pt", max_length=max_length,
return_token_type_ids=True, truncation=True, padding=True)
encodings_test = tokenizer(premise_test, hypothesis_test, return_tensors="pt", max_length=max_length,
return_token_type_ids=True, truncation=True, padding=True)
### create pytorch dataset object
import torch
class XDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
dataset_train = XDataset(encodings_train, label_train)
dataset_val = XDataset(encodings_val, label_val)
dataset_test = XDataset(encodings_test, label_test)
### training
from transformers import Trainer, TrainingArguments
# https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=1, # total number of training epochs
per_device_train_batch_size=2, # batch size per device during training
per_device_eval_batch_size=2, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=dataset_train, # training dataset
eval_dataset=dataset_val # evaluation dataset
)
trainer.train()