Inference result not aligned with local version of same model and revision

Indeed the log of the replica doesn’t really seems to take into account any of the params provided in the UI.

The log of the replica :

Args { model_id: “/rep****ory”, revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: “r-rpelissier-sbw-fidi-labse-58w96y74-e4770-0t00y”, port: 80, uds_path: “/tmp/text-embeddings-inference-server”, huggingface_hub_cache: Some(“/repository/cache”), payload_limit: 2000000, api_key: None, json_output: true, disable_spans: false, otlp_endpoint: None, otlp_service_name: “text-embeddings-inference.server”, cors_allow_origin: None }

1 Like