Flan-T5 - Finetuning to a Longer Sequence Length (512 -> 2048 tokens): Will it work?

Dear HF forum,

I am planning to finetune Flan-t5.
However for my task I need a longer seq length (2048 tokens).

The model has a max token length of 512 currently.

According to related posts on the topic I understand T5 uses relative position embeddings and so can handle longer seq lengths in principle.

However will finetuning this to a longer seq length than it was trained on result in a sub par model?

Thank you

Anuj

2 Likes

Very interested by that topic too.

Hey @anujn , were you able to do this? Any updates on the results and accuracy that you got with this?

Maybe you should use longformer instead allenai/led-base-16384 · Hugging Face