Embedding endpoint returning [None] embeddings

I’m trying to deploy dunzhang/stella_en_1.5B_v5 · Hugging Face as an embedding model. The endpoint launches successfully but the returned embeddings are all [None, None, None, …]. It’s also worth noting that the returned embeddings have dimension 1536 but the model default is actually 1024…
I tried simply deploying the model from the official repo above with “Sentence Embeddings” as the task and also creating a custom endpoint handler using sentence_transformers as described in the inference endpoints guide (cloned repo here: lgbird/stella_1.5B_custom · Hugging Face) and the task set as “Custom”, both with the exact same result!
I’m not sure what’s going wrong/what to try next… I’ll add the logs of the “standard” endpoint(the one from the official model repo) down below, the ones from the custom endpoint handler are the same(and look fine to me). Thanks in advance for any help!

- 2024-10-04T17:33:52.482+00:00 {"timestamp":"2024-10-04T17:33:52.482653Z","level":"INFO","message":"Args { model_id: \"/rep****ory\", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: \"r-lgbird-stella-embeddings-exqg6mkq-96fea-57q2f\", port: 80, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/repository/cache\"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
- 2024-10-04T17:33:52.740+00:00 {"timestamp":"2024-10-04T17:33:52.740581Z","level":"WARN","message":"Warning: Token '<|endoftext|>' was expected to have ID '151643' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
- 2024-10-04T17:33:52.740+00:00 {"timestamp":"2024-10-04T17:33:52.740612Z","level":"WARN","message":"Warning: Token '<|im_start|>' was expected to have ID '151644' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
- 2024-10-04T17:33:52.740+00:00 {"timestamp":"2024-10-04T17:33:52.740618Z","level":"WARN","message":"Warning: Token '<|im_end|>' was expected to have ID '151645' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
- 2024-10-04T17:33:52.742+00:00 {"timestamp":"2024-10-04T17:33:52.742207Z","level":"INFO","message":"Maximum number of tokens per request: 512","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199}
- 2024-10-04T17:33:52.742+00:00 {"timestamp":"2024-10-04T17:33:52.742496Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":28}
- 2024-10-04T17:33:52.872+00:00 {"timestamp":"2024-10-04T17:33:52.871997Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":241}
- 2024-10-04T17:33:53.094+00:00 {"timestamp":"2024-10-04T17:33:53.094540Z","level":"INFO","message":"Starting FlashQwen2 model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":364}
- 2024-10-04T17:34:09.333+00:00 {"timestamp":"2024-10-04T17:34:09.333134Z","level":"INFO","message":"Warming up model","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":257}
- 2024-10-04T17:34:11.624+00:00 {"timestamp":"2024-10-04T17:34:11.624385Z","level":"WARN","message":"Invalid hostname, defaulting to 0.0.0.0","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":319}
- 2024-10-04T17:34:11.626+00:00 {"timestamp":"2024-10-04T17:34:11.626272Z","level":"INFO","message":"Starting HTTP server: 0.0.0.0:80","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":1778}
- 2024-10-04T17:34:11.626+00:00 {"timestamp":"2024-10-04T17:34:11.626296Z","level":"INFO","message":"Ready","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":1779}
- 2024-10-04T17:34:39.911+00:00 {"timestamp":"2024-10-04T17:34:39.911560Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"45.087126ms","queue_time":"322.203µs","tokenization_time":"1.567907ms","total_time":"47.05461ms","name":"embed"},"spans":[{"inference_time":"45.087126ms","queue_time":"322.203µs","tokenization_time":"1.567907ms","total_time":"47.05461ms","name":"embed"}]}
- 2024-10-04T17:39:13.793+00:00 {"timestamp":"2024-10-04T17:39:13.793842Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"20.27212ms","queue_time":"204.423µs","tokenization_time":"157.999µs","total_time":"20.693654ms","name":"embed"},"spans":[{"inference_time":"20.27212ms","queue_time":"204.423µs","tokenization_time":"157.999µs","total_time":"20.693654ms","name":"embed"}]}
- 2024-10-04T17:40:43.781+00:00 {"timestamp":"2024-10-04T17:40:43.781391Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"17.160806ms","queue_time":"205.497µs","tokenization_time":"299.634µs","total_time":"17.715855ms","name":"embed"},"spans":[{"inference_time":"17.160806ms","queue_time":"205.497µs","tokenization_time":"299.634µs","total_time":"17.715855ms","name":"embed"}]}
- 2024-10-04T17:41:09.750+00:00 {"timestamp":"2024-10-04T17:41:09.750369Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"16.989841ms","queue_time":"201.616µs","tokenization_time":"354.533µs","total_time":"17.603286ms","name":"embed"},"spans":[{"inference_time":"16.989841ms","queue_time":"201.616µs","tokenization_time":"354.533µs","total_time":"17.603286ms","name":"embed"}]}
- 2024-10-04T17:43:24.712+00:00 {"timestamp":"2024-10-04T17:43:24.712499Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"72.323516ms","queue_time":"54.927283ms","tokenization_time":"2.630849ms","total_time":"174.609825ms","name":"embed"},"spans":[{"inference_time":"72.323516ms","queue_time":"54.927283ms","tokenization_time":"2.630849ms","total_time":"174.609825ms","name":"embed"}]}
- 2024-10-04T17:44:03.242+00:00 {"timestamp":"2024-10-04T17:44:03.242162Z","level":"INFO","message":"signal received, starting graceful shutdown","target":"text_embeddings_router::shutdown","filename":"router/src/shutdown.rs","line_number":27}
1 Like

It may be quicker to send mentions (@+username) to the authors of these information.

Same issue here using Alibaba-NLP/gte-modernbert-base

1 Like

There is a possibility that it may be a bug in Sentence Transformer or TGI or server config.