Fine-tuning T5 for translation

SnailTheSnail · November 9, 2021, 5:37pm

Hi, I am trying to fine-tune T5 model for translation, however it seems that even though the pairs of sentences look ok after being tokenized there is something wrong with it and I am getting

AssertionError: You should supply an encoding or a list of encodings to this method.

My dataset is pairs of english and french strings like:
“translate English to French: Is this realistic?” , “Est-ce réaliste?”

This is my code:

dataset = pd.read_excel('en-fr.xlsx')
checkpoint = 't5-base'

#tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inputs = tokenizer(list(dataset['eng']), list(dataset['fr']), padding='longest', truncation=True, return_tensors='pt')
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

#model
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

#arguments
training_args = TrainingArguments('trainer_args')

#trainer
trainer = Trainer(model=model, args=training_args, train_dataset=inputs, data_collator=data_collator)
trainer.train()

Thank you in advance for any advice.

Topic		Replies	Views
Need help in fine-tuning T5-Base Model for a sequence task Beginners	0	168	May 8, 2024
Errors when fine-tuning T5 Beginners	7	6472	January 3, 2022
Fine tuning a T5 model for translation - How do I apply my trained tokenizer to the target sentences? 🤗Tokenizers	0	39	July 20, 2024
Finetuning T5 for Summarisation - Poor results Intermediate	1	529	April 28, 2024
Finetuning T5 on Squad 🤗Transformers	1	569	November 29, 2023

Fine-tuning T5 for translation

Related topics