Use LongT5 model with T5 checkpoint

stecklin · January 16, 2023, 9:36am

The LongT5 paper claims LongT5 is compatible with T5 checkpoints.

We experiment with two attention mechanism variations for LongT5 […]: (1) Local Attention and (2) Transient Global Attention(TGlobal).
Both variations preserve several properties of T5: […] compatibility with T5 checkpoints.

How do I accomplish this with transformers?

tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = LongT5ForConditionalGeneration.from_pretrained("t5-small")

The console output tells me there is a difference in naming:

encoder.block.0.layer.0.SelfAttention.o.weight for T5
encoder.block.0.layer.0.LocalSelfAttention.o.weight for LongT5

Is there a way to map the weight names when loading a checkpoint? Or should I download the checkpoint and modify the files before loading it?

Background:

For my use case, I have long input sequences and frequently run into memory problems. LongT5 looks like a good candidate, but the smallest officially released checkpoint is google/long-t5-<type>-base. That’s why I would like to try t5-small checkpoint instead.

Topic		Replies	Views
Issues in finetuning t5-large model 🤗Transformers	1	456	April 25, 2023
Warning when loading T5 encoders 🤗Transformers	3	1879	May 15, 2023
Convert new T5 checkpoints released from Google (NaturalQuestion dataset) 🤗Transformers	3	1487	October 18, 2020
Problems when loading checkpoints 🤗Transformers	2	358	November 20, 2024
Finetuning T5 on custom data Models	0	1057	November 13, 2020

Use LongT5 model with T5 checkpoint

Related topics