Memory efficiency when using softprompts

arunwzd · May 15, 2022, 8:25am

I have split my input into several categories and obtained softprompts for those categories. I want to train a model with softprompt added to the encoder output in T5 encoder-decoder model.

I tried adding softprompts to every datapoint before sending to the model by adding another key (adding to input_ids, attention_mask) to every datapoint input. But I think this will blow up the memory usage since every input data point is going to be appended with a softprompt tensor of (seq_length, d_model) size in T5. Is there a way to improve the memory efficiency of this case (like using referencing instead of sending values for every datapoint)? Or is it already efficient with the way arrow dataset loads data during training?
Alternatively, I thought to add the softprompt tensors only before the decoder’s forward method but not sure if thats efficient. Would be glad to hear your thoughts. @sgugger @patrickvonplaten

Topic		Replies	Views
Speeding up T5 inference :rocket: 🤗Transformers	17	13069	August 26, 2022
Fine-tuning T5 with long sequence length, using activation checkpointing with Deepspeed 🤗Transformers	6	2833	December 5, 2022
How to separately use T5 decoder Models	4	2819	July 7, 2024
Boost inference speed of T5 models up to 5X & reduce the model size by 3X 🤗Transformers	2	5593	June 8, 2023
Training t5-based seq to seq suddenly reaches loss of `nan` and starts predicting only `<pad>` Beginners	3	2109	August 11, 2023