Easiest way to get a senetence embedder from a transformers model?

Currently I am loading each model via AutoModel.from_pretrained and then based on the architecture add a layer or do not add (Bert has a pooler for example). Is there any class in transformers library that can be used for feature extraction from an input text with less hassle? I’m trying to experiment on multiple models, and this way it’s a bit unclean tbf.

You may want to look into the pipeline. You can use the FeatureExtractionPipeline to get the final outputs of the base model and then pool those together as you wish. However, because you are specifically interested in getting sentence embeddings (one vector output instead of one for every token), you’ll have to use a model that is trained on sentence representations for meaningful vectors. I’d recommend to use sentence-transformers for that.

1 Like