Make Text Embedding Server compatible

I have cloned embedding model udever-bloom-1b1.
At first it asked for onnx files as it didnt have so i convert them to onnx.
Then pushed to my repo here
I also configured config.json with {"max_position_embeddings": 2048,}. After that copied 1_Pooling/config.json from jinaai embeddings .

{
  "word_embedding_dimension": 1536,
  "pooling_mode_cls_token": false,
  "pooling_mode_mean_tokens": true,
  "pooling_mode_max_tokens": false,
  "pooling_mode_mean_sqrt_len_tokens": false
}

Now when running embedding server

2024-08-05T09:52:57.224194Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-08-05T09:52:57.307450Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-05T09:53:00.166636Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-05T09:53:00.405920Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-05T09:53:00.405936Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-05T09:53:00.885433Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-05T09:53:05.337696Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-08-05T09:53:05.990842Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 5.584921999s
2024-08-05T09:53:06.456293Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[BOQ]' was expected to have ID '250680' but was given ID 'None'    
2024-08-05T09:53:06.456307Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[EOQ]' was expected to have ID '250681' but was given ID 'None'    
2024-08-05T09:53:06.456310Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[BOD]' was expected to have ID '250682' but was given ID 'None'    
2024-08-05T09:53:06.456313Z  WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '[EOD]' was expected to have ID '250683' but was given ID 'None'    
2024-08-05T09:53:06.457094Z  WARN text_embeddings_router: router/src/lib.rs:195: Could not find a Sentence Transformers config
2024-08-05T09:53:06.457162Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 2048
2024-08-05T09:53:06.457515Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 4 tokenization workers
2024-08-05T09:53:06.936365Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor h.8.input_layernorm.weight failed.GetFileLength for /data/models--Saugatkafley--udever-bloom-1b1-onnx/snapshots/698acf469fd193b51dd1125dbd460c8258c7b606/model.onnx_data failed:Invalid fd was supplied: -1

Hi @Saugatkafley, can you also share the command you used to launch the server? And which Docker image did you use?

I have used cpu version , I dont have GPU.
How do i make an embedding model compatible to Server .
What are the requirements?

Run embedding server sh file

MODEL="Saugatkafley/udever-bloom-1b1-onnx"
VOLUME=/home/saugat/Desktop/UBUNTU_desktop/MODELS/embeddings

docker run -p 8080:80 -v $VOLUME:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $MODEL