For folks who are still struggling, I think I found one potential reason for why training on TPU is slow, look here (I set padding to True
in my tokenizer and I’m already seeing a speedup in my TPU training, basically it looks like it didn’t have anything to do with my torch xla installation)
1 Like