Hi,
You’re using the SFTTrainer
class with a seq2seq model like T5. I don’t think that’s supported (cc @lewtun). I’d recommend using a decoder-only model like Llama or Mistral.
Hi,
You’re using the SFTTrainer
class with a seq2seq model like T5. I don’t think that’s supported (cc @lewtun). I’d recommend using a decoder-only model like Llama or Mistral.