Easiest way to get a senetence embedder from a transformers model?

FeryET · April 7, 2022, 11:11am

Currently I am loading each model via AutoModel.from_pretrained and then based on the architecture add a layer or do not add (Bert has a pooler for example). Is there any class in transformers library that can be used for feature extraction from an input text with less hassle? I’m trying to experiment on multiple models, and this way it’s a bit unclean tbf.

BramVanroy · April 7, 2022, 11:20am

You may want to look into the pipeline. You can use the FeatureExtractionPipeline to get the final outputs of the base model and then pool those together as you wish. However, because you are specifically interested in getting sentence embeddings (one vector output instead of one for every token), you’ll have to use a model that is trained on sentence representations for meaningful vectors. I’d recommend to use sentence-transformers for that.

Topic		Replies	Views
Where to pick-up embedding data from BERT model? Models	2	878	February 8, 2022
Get output embedding of FeatureExtractor 🤗Transformers	1	703	April 20, 2021
What should be used as sentence embedding for BertModel? Beginners	0	1908	May 24, 2021
Choosing the layer for extracting NLP features (using using pipeline) Models	0	768	August 19, 2021
Get embedding from finetuned BertForSequenceClassification model 🤗Transformers	1	3720	February 19, 2022

Easiest way to get a senetence embedder from a transformers model?

Related topics