Embeddings on custom model using trainer

pacoelflaco · December 15, 2022, 9:54pm

I want to extract embeddings from different models, concatenate them together and use that to train a custom model.

I already have my custom model which is a variation of DebertaV2 that accepts a bigger hidden size for embeddings. I already have my concatenated embeddings, which are a list of tensors of shape (512, 2560).

My question is: how do I pass on these concatenated embeddings to the trainer?

When using input_ids, its simple enough because I can fit a dataset since the shape is only 1 dimension. But for inputs_embeds, the shape is 2 dimensional since you have (seq_length, hidden_size).

I have tried creating a TensorDataset but they do not use columns as far as I am aware so I dont believe the trainer can recognize which tensor is my input_embeds and which is my labels.

Should I create a custom dataset for this purpose, is there an alternative, am I doing things wrong?
Would love some feedback!

Topic		Replies	Views
Resources for using custom models with trainer Beginners	6	5380	April 6, 2021
Registering custom model and config to AutoModel and AutoConfig Models	1	875	November 7, 2023
Using custom embeddings for pre-training model for new vocabulary Beginners	0	205	December 25, 2023
Understanding how to implement custom BERT model Beginners	0	512	November 22, 2021
Saving Manually Resized Embeddings for a Pretrained Bert Model (I believe I am asking this correctly) Beginners	0	105	November 7, 2024

Embeddings on custom model using trainer

Related topics