Hello, I am building a multilabel classifier that uses the embeddings from sentence-transformers/all-MiniLM-L6-v2 as input. After training a model that produces good enough results, I would like to run this in the brower using transformersjs and the Xenova/all-MiniLM-L6-v2 model. However, I am getting different embeddings for the same text.
Here is my python code:
model_name = "sentence-transformers/all-MiniLM-L6-v2"
mdl = SentenceTransformer(model_name)
raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
]
se = mdl.encode(raw_inputs)
# the first 4 dimensions...
# [[-0.0635541 0.00168205 0.08878317 0.01061784]
# [-0.0278877 0.02493023 0.01891949 0.03274209]]
The js code:
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@latest';
let extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2')
let result = await extractor( [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
], { pooling: 'mean', normalize: true });
console.log( result.data.slice(0,4) )
console.log( result.data.slice(384,388) )
// [-0.0713, 0.0169, 0.0940, 0.00842]
// [-0.0041, 0.0070, 0.0365 0.0422]
I would like to reproduce the sentence transformer embeddings. If this is not possible, I just need to have the same embeddings between python and javascript, and I will try to retrain. My specific questions are:
- Am I doing this correctly?
- If so, can I get the javascript embeddings to match?
Thank you