I want to extract embeddings from different models, concatenate them together and use that to train a custom model.
I already have my custom model which is a variation of DebertaV2 that accepts a bigger hidden size for embeddings. I already have my concatenated embeddings, which are a list of tensors of shape (512, 2560).
My question is: how do I pass on these concatenated embeddings to the trainer?
When using input_ids, its simple enough because I can fit a dataset since the shape is only 1 dimension. But for inputs_embeds, the shape is 2 dimensional since you have (seq_length, hidden_size).
I have tried creating a TensorDataset but they do not use columns as far as I am aware so I dont believe the trainer can recognize which tensor is my input_embeds and which is my labels.
Should I create a custom dataset for this purpose, is there an alternative, am I doing things wrong?
Would love some feedback!