GPU OOM when training

Thank you, Bram! You are totally right, a problematic data sample of extremely large size caused this issue. Also, my batch size didn’t fit with the max_seq_length, when I used padding="max_length" I got an OOM on the first batch, which was expected. I reduced my batch size (sad) and trimmed the samples to tokenizer.model_max_length.

1 Like