Looking for analogous model.encode() function from SBERT in HF

Hong-Yi · December 10, 2021, 8:03am

Q1: Whats the equivalent of model.encode() from the HF transformer library using AutoModel.from_pretrained('distilbert-base-cased') ?

Here is the code of interest in SBERT that works and i want to replicate its behavior with HF’s distilbert

from sentence_transformers import SentenceTransformer 
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
distilBERT_sentence_embeddings = model.encode(list(x_train), show_progress_bar=True)

Here is what I’ve tried but this does not give me the embedding that i’m looking for:

from transformers import AutoTokenizer, AutoModel 
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

Note that I haven’t plugged in a list of x_train but the goal is to eventually plug it in for “Hello, my dog is cute” when I get the code to work.

Q2: Additionally, SBERT here is trained on an NLI task using a siamese network that requires pairs of inputs – is it acceptable to use this SBERT model for non-NLI tasks (like seq classification or sentiment analysis) and with tasks that don’t have sentence pairings?

Topic		Replies	Views
Distilbert-base-nli-stsb-mean-tokens OOM encoding sentences of 100K docs Beginners	4	685	February 9, 2021
DistilBERT and CLS token Beginners	2	2447	February 21, 2021
Convert transformer to SavedModel Beginners	4	2569	November 30, 2021
Extracting embeddings with distilbert? (in tensorflow) 🤗Transformers	5	2999	August 6, 2021
How to convert sentence-transformers/msmarco-distilbert-base-tas-b model to torchscript 🤗Transformers	0	41	October 30, 2024

Looking for analogous model.encode() function from SBERT in HF

Related topics