Longformer for Encoder Decoder with gradient checkpointing

LadyHangaku · January 6, 2022, 10:40am

I’m struggling to find the right transformers class for my task.
I want to solve a seq2seq problem with an encoder decoder longformer. I generated one with this german RoBERTa model using this script.
I know that I could use EncoderDecoderModel(), but the issue is that it doesn’t support gradient checkpointing, which I desperately need, because otherwise it wouldn’t run on the machine.
And if I understand it correctly, the class LEDModel() only takes already built encoder decoder models and not just a plain longformer to chain it together, so that is also not an option.
I thought about initializing two seperate Longformers for encoder and decoder with LongformerModel(), but then I don’t know how to glue them together. Can someone explain how it works?
Or does anyone have another suggestions on how I can solve this problem?
Thank you very much!

LadyHangaku · January 7, 2022, 10:41am

I found a solution which at least helps a little:
When using EncoderDecoderModel(), it is possible to set gradient checkpointing at least on the encoder part:
model.encoder.config.gradient_checkpointing = True

Topic		Replies	Views
[Feature Request] Gradient Checkpointing for EncoderDecoderModel 🤗Transformers	3	1345	April 10, 2023
Self-made Longformer doesn't take more than 512 token 🤗Transformers	0	459	January 5, 2022
EncoderDecoderModel with Longformer and Bert 🤗Transformers	1	622	February 11, 2021
'BertEncoder' object has no attribute 'gradient_checkpointing' 🤗Transformers	2	7162	August 1, 2022
Converting MBart to Longformer version Models	0	509	August 8, 2021

Longformer for Encoder Decoder with gradient checkpointing

Related topics