SentenceSimilarityInputsCheck expected dict not list: `__root__` in `parameters`


I’m new to ML/AI and haven’t experimented much yet. I am a experienced developer in my day job though. I’m having a few issues I’m hoping someone can help me with. My main goal is the following.

I’m trying to write a bot that crawls a website, vectorizes all the text and stores this in pinecone db. Then the user can ask a question, and I query pinecone db for text that hopefully answers their question and feed that to chatgpt for it to answer the question. Everything seems to be working but the vector embedding/semantic search doesn’t seem to be returning text that really answers the question.

I am using the langchain js to do a lot of this, langchain has a HuggingFaceInferenceEmbeddings that I can use to vectorize text. The model it defaults to is the sentence-transformers/distilbert-base-nli-mean-tokens. I wanted to use a different model to see if I could get better results but when I try to use sentence-transformers/all-mpnet-base-v2 it throws the following error:

        throw new Error(output.error);
Error: SentenceSimilarityInputsCheck expected dict not list: `__root__` in `parameters`
    at request (C:\projects\test\node_modules\.pnpm\@huggingface+inference@2.3.0\node_modules\@huggingface\inference\dist\index.js:136:15)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async featureExtraction (C:\projects\test\node_modules\.pnpm\@huggingface+inference@2.3.0\node_modules\@huggingface\inference\dist\index.js:458:15)
    at async RetryOperation._fn (C:\projects\test\node_modules\.pnpm\p-retry@4.6.2\node_modules\p-retry\index.js:50:12) {
  attemptNumber: 7,
  retriesLeft: 0

There was a similar post here Feature extraction using Inference API error but I’m not sure what this means. I don’t understand the difference between sentence similarity and feature extraction. I tried to look it up but I couldn’t find much. I don’t understand how they differ, I just want to vectorize the text so I can store and query it from the db. Also I did take that answers advice and if I use something like questgen/all-mpnet-base-v2-feature-extraction-pipeline · Hugging Face then it works, but this model only has ~90 downloads so I think I’m doing something wrong, or misunderstanding.

Any help would be appreciated. Thanks

Hi @Denaldo,

Our API Inference supports multiple tasks. For certain models, we provide a straightforward abstraction for embedding similarity, such as with sentences. The tag and/or pipeline_tag establishes the correct task on the API Inference backend for all compatible models on our hub.

When using sentence-similarity, the backend establishes a sentence similarity pipeline. It expects multiple sentence inputs, which will subsequently be transformed into embeddings and compared through cosine similarity

When the model is set for feature-extraction, it expects the input sentence and returns the corresponding embeddings vector.

So if you need a model that is not supported into the feature-extraction pipeline, you can duplicate it and set the correct tag and pipeline_tag.
You can check the in different models for comparison

Thank you for replying that makes a lot of sense. I’ve been a bit obsessive over this project so I’ve put in a insane amount of hours since I asked this question and have learned a lot.

1 Like

I am using this model sentence-transformers/multi-qa-mpnet-base-dot-v1

I get the

Chain run errored with error: "SentenceSimilarityInputsCheck expected dict not list: `__root__` in `parameters`"

when I try to use inference with tiiuae/falcon-7b-instruct or other models, regardless of how I edit the README file

Perhaps the bug is at the langchain side of things?

@nephel are you trying to use sentence-transformers/multi-qa-mpnet-base-dot-v1 for feature extraction
or sentence similarity?

Sentence similarity

I’ve also seen this Error: SentenceSimilarityInputsCheck · Issue #1083 · hwchase17/langchainjs · GitHub

