Embeddings via API fundamental doubts

sujantkumarkv · September 10, 2023, 11:18am

Hi community,

I’m new here and I have some doubts wrt embeddings via HF Inference API & other methods and architecture level doubts. Absolutely any help is truly appreciated. I have been learning a ton, thanks for your help in advance.

I finetuned a model with qlora & hosted on HF (here) and need to get embeddings with it.

firstly, gpt4 told me this: "we often use the last layer's outputs when we're looking for rich, contextual embeddings." plus a similar blog (here)
The BERT base model uses 12 layers of transformer encoders as discussed, and each output per token from each layer of these can be used as a word embedding!. Perhaps you wonder which is the best, though?
told similar. Is it factually correct? as i doubt hallucinations.
now, considering the above is correct, which layer’s embeddings does the HF Inference API provide? HF tells me I can use Inference API like so:

import requests
model_id = "sentence-transformers/all-MiniLM-L6-v2"
hf_token = "get your token in http://hf.co/settings/tokens"

api_url = f"https://api-inference.huggingface.co/pipeline/feature-extraction/{model_id}"
headers = {"Authorization": f"Bearer {hf_token}"}
def query(texts):
   response = requests.post(api_url, headers=headers, json={"inputs": texts, "options":{"wait_for_model":True}})
   return response.json()

but trying with my 7b model didn’t load in colab (it never ran out of memory but just kept running). note: i haven't tried on a rented gpu yet (maybe that works)

Also, I found no way to use a quantised model (TheBloke’s GGML/GGUF) for getting embeddings from the Inference API (help me if a way exists)
On a side notice, I also tried quantised model to generate embeddings using llama.cpp using TheBloke/llama-2-7b-GGUF with embedding command just like we can do inference with main & it works but I suppose that since its quantised to say 4 or 8 bits, the embeddings would also be less precise right and wont be exact right?

Thanks for bearing with my stupid doubts and again, any help is truly appreciated. thanks

Topic		Replies	Views
How to get word embedding from a TF bert model? 🤗Transformers	0	344	October 1, 2021
Extracting sentence embeddings from NLP models from each layer seperately Beginners	0	726	August 18, 2021
Getting pretrained embeddings 🤗Transformers	0	599	June 20, 2023
Using Accelerated Inference API to produce sentense embeddings 🤗Transformers	16	2234	April 12, 2023
Embedding layer or last hidden_hidden_state 🤗Transformers	0	218	November 1, 2023

Embeddings via API fundamental doubts

Related topics