I’m trying to deploy dunzhang/stella_en_1.5B_v5 · Hugging Face as an embedding model. The endpoint launches successfully but the returned embeddings are all [None, None, None, …]. It’s also worth noting that the returned embeddings have dimension 1536 but the model default is actually 1024…
I tried simply deploying the model from the official repo above with “Sentence Embeddings” as the task and also creating a custom endpoint handler using sentence_transformers as described in the inference endpoints guide (cloned repo here: lgbird/stella_1.5B_custom · Hugging Face) and the task set as “Custom”, both with the exact same result!
I’m not sure what’s going wrong/what to try next… I’ll add the logs of the “standard” endpoint(the one from the official model repo) down below, the ones from the custom endpoint handler are the same(and look fine to me). Thanks in advance for any help!
- 2024-10-04T17:33:52.482+00:00 {"timestamp":"2024-10-04T17:33:52.482653Z","level":"INFO","message":"Args { model_id: \"/rep****ory\", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: \"r-lgbird-stella-embeddings-exqg6mkq-96fea-57q2f\", port: 80, uds_path: \"/tmp/text-embeddings-inference-server\", huggingface_hub_cache: Some(\"/repository/cache\"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: \"text-embeddings-inference.server\", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
- 2024-10-04T17:33:52.740+00:00 {"timestamp":"2024-10-04T17:33:52.740581Z","level":"WARN","message":"Warning: Token '<|endoftext|>' was expected to have ID '151643' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
- 2024-10-04T17:33:52.740+00:00 {"timestamp":"2024-10-04T17:33:52.740612Z","level":"WARN","message":"Warning: Token '<|im_start|>' was expected to have ID '151644' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
- 2024-10-04T17:33:52.740+00:00 {"timestamp":"2024-10-04T17:33:52.740618Z","level":"WARN","message":"Warning: Token '<|im_end|>' was expected to have ID '151645' but was given ID 'None'","log.target":"tokenizers::tokenizer::serialization","log.module_path":"tokenizers::tokenizer::serialization","log.file":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","log.line":159,"target":"tokenizers::tokenizer::serialization","filename":"/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs","line_number":159}
- 2024-10-04T17:33:52.742+00:00 {"timestamp":"2024-10-04T17:33:52.742207Z","level":"INFO","message":"Maximum number of tokens per request: 512","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199}
- 2024-10-04T17:33:52.742+00:00 {"timestamp":"2024-10-04T17:33:52.742496Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":28}
- 2024-10-04T17:33:52.872+00:00 {"timestamp":"2024-10-04T17:33:52.871997Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":241}
- 2024-10-04T17:33:53.094+00:00 {"timestamp":"2024-10-04T17:33:53.094540Z","level":"INFO","message":"Starting FlashQwen2 model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":364}
- 2024-10-04T17:34:09.333+00:00 {"timestamp":"2024-10-04T17:34:09.333134Z","level":"INFO","message":"Warming up model","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":257}
- 2024-10-04T17:34:11.624+00:00 {"timestamp":"2024-10-04T17:34:11.624385Z","level":"WARN","message":"Invalid hostname, defaulting to 0.0.0.0","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":319}
- 2024-10-04T17:34:11.626+00:00 {"timestamp":"2024-10-04T17:34:11.626272Z","level":"INFO","message":"Starting HTTP server: 0.0.0.0:80","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":1778}
- 2024-10-04T17:34:11.626+00:00 {"timestamp":"2024-10-04T17:34:11.626296Z","level":"INFO","message":"Ready","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":1779}
- 2024-10-04T17:34:39.911+00:00 {"timestamp":"2024-10-04T17:34:39.911560Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"45.087126ms","queue_time":"322.203µs","tokenization_time":"1.567907ms","total_time":"47.05461ms","name":"embed"},"spans":[{"inference_time":"45.087126ms","queue_time":"322.203µs","tokenization_time":"1.567907ms","total_time":"47.05461ms","name":"embed"}]}
- 2024-10-04T17:39:13.793+00:00 {"timestamp":"2024-10-04T17:39:13.793842Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"20.27212ms","queue_time":"204.423µs","tokenization_time":"157.999µs","total_time":"20.693654ms","name":"embed"},"spans":[{"inference_time":"20.27212ms","queue_time":"204.423µs","tokenization_time":"157.999µs","total_time":"20.693654ms","name":"embed"}]}
- 2024-10-04T17:40:43.781+00:00 {"timestamp":"2024-10-04T17:40:43.781391Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"17.160806ms","queue_time":"205.497µs","tokenization_time":"299.634µs","total_time":"17.715855ms","name":"embed"},"spans":[{"inference_time":"17.160806ms","queue_time":"205.497µs","tokenization_time":"299.634µs","total_time":"17.715855ms","name":"embed"}]}
- 2024-10-04T17:41:09.750+00:00 {"timestamp":"2024-10-04T17:41:09.750369Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"16.989841ms","queue_time":"201.616µs","tokenization_time":"354.533µs","total_time":"17.603286ms","name":"embed"},"spans":[{"inference_time":"16.989841ms","queue_time":"201.616µs","tokenization_time":"354.533µs","total_time":"17.603286ms","name":"embed"}]}
- 2024-10-04T17:43:24.712+00:00 {"timestamp":"2024-10-04T17:43:24.712499Z","level":"INFO","message":"Success","target":"text_embeddings_router::http::server","filename":"router/src/http/server.rs","line_number":706,"span":{"inference_time":"72.323516ms","queue_time":"54.927283ms","tokenization_time":"2.630849ms","total_time":"174.609825ms","name":"embed"},"spans":[{"inference_time":"72.323516ms","queue_time":"54.927283ms","tokenization_time":"2.630849ms","total_time":"174.609825ms","name":"embed"}]}
- 2024-10-04T17:44:03.242+00:00 {"timestamp":"2024-10-04T17:44:03.242162Z","level":"INFO","message":"signal received, starting graceful shutdown","target":"text_embeddings_router::shutdown","filename":"router/src/shutdown.rs","line_number":27}