- I have split my input into several categories and obtained softprompts for those categories. I want to train a model with softprompt added to the encoder output in T5 encoder-decoder model.
I tried adding softprompts to every datapoint before sending to the model by adding another key (adding to
attention_mask) to every datapoint input. But I think this will blow up the memory usage since every input data point is going to be appended with a softprompt tensor of (seq_length, d_model) size in T5. Is there a way to improve the memory efficiency of this case (like using referencing instead of sending values for every datapoint)? Or is it already efficient with the way arrow dataset loads data during training?