Roberta-base takes too long

I am fine-tuning roberta-base using this script on ~8M example for classification. I am using NVIDIA RTX A6000 with 6 gpus. Max length is 64, batch_size=32 gradient_accumulation_steps=4. 1 epochs takes ~8 hours. Is it in the normal? I see that the original version was trained 1024 gpus.