Memory efficiency when using softprompts

  1. I have split my input into several categories and obtained softprompts for those categories. I want to train a model with softprompt added to the encoder output in T5 encoder-decoder model.
  • I tried adding softprompts to every datapoint before sending to the model by adding another key (adding to input_ids, attention_mask) to every datapoint input. But I think this will blow up the memory usage since every input data point is going to be appended with a softprompt tensor of (seq_length, d_model) size in T5. Is there a way to improve the memory efficiency of this case (like using referencing instead of sending values for every datapoint)? Or is it already efficient with the way arrow dataset loads data during training?

  • Alternatively, I thought to add the softprompt tensors only before the decoder’s forward method but not sure if thats efficient. Would be glad to hear your thoughts. @sgugger @patrickvonplaten