code: Smart Batching Tutorial - Speed Up BERT Training · Chris McCormick
I have tried fine-tune on 1 epoch and the time will decrease & accuracy increase but if it increases by 3-4 epoch then it looks like the model has learned less focus on pad token so smart batch will not accuracy increase.