Indeed the log of the replica doesn’t really seems to take into account any of the params provided in the UI.
The log of the replica :
Args { model_id: “/rep****ory”, revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: “r-rpelissier-sbw-fidi-labse-58w96y74-e4770-0t00y”, port: 80, uds_path: “/tmp/text-embeddings-inference-server”, huggingface_hub_cache: Some(“/repository/cache”), payload_limit: 2000000, api_key: None, json_output: true, disable_spans: false, otlp_endpoint: None, otlp_service_name: “text-embeddings-inference.server”, cors_allow_origin: None }