Finetuning GPT2 using Multiple GPU and Trainer

@SUNM It is my understanding that if GPUs are available then Trainer will use them. One thing we can do to check this is to keep your code as it is written above, and try to run it on a very small sample of the train_dataset and eval_dataset. Something like 100 examples for the train and maybe 20 or 50 for the eval. I wouldn’t be worried about what the model evaluates to, we just want to see if it is performing the training and evaluation on GPU.

One straight forward way to do this is run your training and evaluation in one tab from the command line and open another command line tab and constantly run nvidia-smi to see if GPU utilization is happening during training and evaluation. Another command if you have it is to keep nvtop open in that second tab and it will continuously update on its own so you don’t have to constantly refresh it.

I would say start there and report back what you find and we can go from there. There might be a better way to check if @valhalla or @sgugger want to chime in.

Lastly, while the code snippet is fairly straightforward to read and understand it can help those wishing to respond to easier read the code if it is surrounded by tick marks. For instance, instead of:
model = AutoModel.from_pretrained(“”).to(“cuda”)

training_args = TrainingArguments(
output_dir=“./gpt2-gerchef”, #The output directory
overwrite_output_dir=True, #overwrite the content of the output directory
num_train_epochs=3, # number of training epochs
per_device_train_batch_size=32, # batch size for training
per_device_eval_batch_size=64, # batch size for evaluation
eval_steps = 400, # Number of update steps between two evaluations.
save_steps=800, # after # steps model is saved
warmup_steps=500,# number of warmup steps for learning rate scheduler
prediction_loss_only=True,
)

trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)

You can place three tick marks at the top and bottom of the code block to format it into:

model = AutoModel.from_pretrained(“”).to(“cuda”)

training_args = TrainingArguments(
output_dir=“./gpt2-gerchef”, #The output directory
overwrite_output_dir=True, #overwrite the content of the output directory
num_train_epochs=3, # number of training epochs
per_device_train_batch_size=32, # batch size for training
per_device_eval_batch_size=64, # batch size for evaluation
eval_steps = 400, # Number of update steps between two evaluations.
save_steps=800, # after # steps model is saved
warmup_steps=500,# number of warmup steps for learning rate scheduler
prediction_loss_only=True,
)

trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)

which makes it easier to dissect and read. Here’s a short read about it.

Happy to help out. Let me know what you find.