Flan-T5 - Finetuning to a Longer Sequence Length (512 -> 2048 tokens): Will it work?

anujn · February 10, 2023, 2:55am

Dear HF forum,

I am planning to finetune Flan-t5.
However for my task I need a longer seq length (2048 tokens).

The model has a max token length of 512 currently.

According to related posts on the topic I understand T5 uses relative position embeddings and so can handle longer seq lengths in principle.

However will finetuning this to a longer seq length than it was trained on result in a sub par model?

Thank you

Anuj

Totototo · May 10, 2023, 5:41pm

Very interested by that topic too.

kabhinayyy · October 3, 2023, 8:58am

Hey @anujn , were you able to do this? Any updates on the results and accuracy that you got with this?

Nevermetyou · January 9, 2024, 2:07pm

Maybe you should use longformer instead allenai/led-base-16384 · Hugging Face

Topic		Replies	Views
How does huggingface T5 flax pretraining script handles very long sentences? 🤗Transformers	0	365	May 4, 2022
Token indices sequence length is longer than the specified maximum sequence length 🤗Tokenizers	4	23230	February 15, 2023
Fine-tuning BERT with sequences longer than 512 tokens Models	7	27750	April 4, 2022
Flan-T5 finetuning, predictions too short? Beginners	0	316	April 17, 2023
Finetuning Sequence-Pairs (GLUE) with higher sequence lengths seems to fail? Beginners	1	617	December 4, 2020