df['train'] = df['test']
This makes the dataset smaller, so it seems prone to overfitting, but it’s probably just set up this way for testing purposes.
I don’t see any obvious issues with the data preprocessing code. I’m not sure if padding tokens should be included in the labels, and if that differs between Seq2Seq and CausalLM… I’m not sure about that part.
The LR is slightly high (though I don’t think it’s a problem), and the lack of LoRA Dropout or Weight Decay settings might be related to overfitting.