Using trainer to train a bart model on 4 gpus failed

I am using trainer to train a bart model on 4 gpus(one node), the command line goes like this:
python -m torch.distributed.launch my_file.py.
and code goes like:

args = Seq2SeqTrainingArguments(
    "bart-large-copy",
    evaluation_strategy = "epoch",
    save_strategy='epoch',
    learning_rate=2e-5,
    weight_decay=0.001,
    #lr_scheduler_type='cosine',
    adam_beta1=0.9,
    adam_beta2=0.98,
    #warmup_steps=4000,
    label_smoothing_factor=0.1,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    save_total_limit=3,
    num_train_epochs=50,
    predict_with_generate=True,
    max_grad_norm=0.0,
    load_best_model_at_end=True,
    metric_for_best_model='sacrebleu',
    greater_is_better=True,
    fp16=False,
    gradient_accumulation_steps=2,   
)
trainer = Seq2SeqTrainer(
    model=model, 
    args=args,
    train_dataset=dataset['train'],
    eval_dataset=dataset['dev'],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

I got the error info as follows:

envs:
transformers: 4.16.2
torch: 1.9.0+rocm4.0.1
gpus: 4 amd Vega 20 66a1

I dont know why it doesnt work.
wish anyone can give me some suggestion.