Hi all,
I am trying to train the seq2seq model using t5v1.1
My task is simple; I want to map text from format A to format B
for example, 19 July 2020 → 19/07/2020. This is not my real data, just an example of what I am doing.
I want to train the tokenizer for this data, but for the seq2seq model, the tokenizer needs to tokenize both input data and label, right?
So, I am a bit confused about arranging the data to train the tokenizer.
Should I concatenate input data and label together and then pass it old_tokenizer.train_new_from_iterator
I have read the doc Training a new tokenizer from an old one - Hugging Face NLP Course
But I am still confused about seq2seq setting
Thanks