UserWarning: You passed a tokenizer with
padding_sidenot equal to
rightto the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding
tokenizer.padding_side = ‘right’ to your code.
It is working now after adding padding_side=right to the tokenizer. Why does the padding side affect overflow in half-precision training?