Unable to Run Sentence Transformer Text embedding in Docker

Tharak3 · January 7, 2025, 12:19pm

I have the Sentence Transformer model which would generates embedding to the user query, the ST model works fine in local but when i build docker image and run it from docker it was not generating the embedding.

I don’t see any error in docker logs as well, it just simply fails without any error. I have shared the code down below

 model_name = "sentence-transformers/all-MiniLM-L6-v2"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name)

    # tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
    # model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

    inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to("cpu")
    # with torch.no_grad():
    embeddings = model(**inputs).last_hidden_state.mean(dim=1)  # Average pooling

I have tried giving privileged access to the docker image, allocating max resources and still fails to run this in docker. I am completely skeptical of why this not working.

Can any one of you please help me in this.

Thanks

Alanturner2 · January 7, 2025, 1:11pm

I will give you some tips for your debugging.

1. Verify Docker Configuration

Python Environment: Ensure the Python version inside the container matches the one used locally.
Installed Dependencies: Confirm that all required dependencies (like transformers, torch, etc.) are installed in the Docker environment. Use a requirements.txt file to match your local setup.
Device Compatibility: If running on a GPU, ensure CUDA drivers and nvidia-container-runtime are configured properly. For CPU-only, confirm no GPU-specific configurations are interfering.

2. Code Adjustments

Device Specification: In Docker, if you’re using a CPU-only environment, explicitly set the device to cpu. Example:

device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name).to(device)
inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to(device)
embeddings = model(**inputs).last_hidden_state.mean(dim=1)

Model Loading: If the model is not loading properly, try forcing the download of pre-trained weights:
```
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="/tmp/model_cache")
model = AutoModel.from_pretrained(model_name, cache_dir="/tmp/model_cache")
```
Add cache_dir to prevent issues with missing files inside the container.

3. Check for Silent Failures

Wrap your embedding code in a try-except block to catch potential silent errors:

try:
    inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to("cpu")
    embeddings = model(**inputs).last_hidden_state.mean(dim=1)
    print(embeddings)
except Exception as e:
    print(f"Error: {e}")

4. Ensure Docker Permissions

Verify file permissions inside the container. If the model or data files can’t be accessed, it could cause silent failures. Use:
```
docker exec -it <container_id> ls -l /path/to/model/cache
```

Hope this help!

Topic		Replies	Views
Transformer model works locally but not in Docker container Models	6	2308	June 10, 2025
Inference just halts, no error, how to troubleshoot 🤗Transformers	7	1211	February 13, 2024
Docker container, run model only 🤗Transformers	0	1140	October 21, 2020
Make Text Embedding Server compatible 🤗Optimum	2	261	August 8, 2024
Containerizing transformers with Docker and FastAPI 🤗Transformers	1	2054	August 28, 2020

Unable to Run Sentence Transformer Text embedding in Docker

1. Verify Docker Configuration

2. Code Adjustments

3. Check for Silent Failures

4. Ensure Docker Permissions

Related topics