Smart Batching - speech up Bert finetune

I have tried fine-tune on 1 epoch and the time will decrease & accuracy increase but if it increases by 3-4 epoch then it looks like the model has learned less focus on pad token so smart batch will not accuracy increase.