How can I use trainer.train() in runpod's multi gpu?

!pip install --upgrade pip
!pip install transformers
!pip install datasets
!pip install pandas
!pip install openpyxl
!pip install accelerate

from transformers import Trainer, TrainingArguments
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import DataCollatorForSeq2Seq

tokenizer = AutoTokenizer.from_pretrained("skt/kogpt2-base-v2")
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained("skt/kogpt2-base-v2")
model.resize_token_embeddings(len(tokenizer))

data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, return_tensors='pt')

training_args = TrainingArguments(
    output_dir = './outputs',
    logging_dir = './logs',
    num_train_epochs = 1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    logging_steps = 50,
    save_steps= 50,
    save_total_limit=2
)

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=dataset,
    data_collator=data_collator
    )

trainer.train()

I set training code like this in runpod 2 L40. It work in one L40.

But when I run same code in 2 L40 it didn’t work.

It take GPU memory and utilization like normal but just didn’t update model weight.