Setting different embedding dim of original model when training

wilsoneto · June 7, 2023, 4:54pm

Hello, due to the large memory space embeddings take, is it possible, when training another model (derived from a previous one), to set a differente (smaller) dimensionality parameter in Pooling like:

# Use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
word_embedding_model = models.Transformer(model_name) #Original model works with 1024 dim

# Apply mean pooling to get one fixed sized sentence vector
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
                               pooling_mode_mean_tokens=True,
                               pooling_mode_cls_token=False,
                               pooling_mode_max_tokens=False)

Then, instead of using get_word_embedding_dimension(), is it just possible to set the desired dimension to the model being trained?

Thanks so much for the attention, I am making this question so I can have some choices for a team…

Topic		Replies	Views
Changing pooling method in pre-trained models 🤗Transformers	0	1577	June 19, 2023
How to train new token embedding to add to a pretrain model? 🤗Transformers	1	3662	January 6, 2021
Dimensions in hkunlp/instructor-large Models	1	624	August 18, 2023
[Question] Why does vocab size determine training parameters Beginners	2	854	August 5, 2021
Is there a model that pooled_output=256? Models	0	326	January 7, 2023

Setting different embedding dim of original model when training

Related topics