Unable to Run Sentence Transformer Text embedding in Docker

I have the Sentence Transformer model which would generates embedding to the user query, the ST model works fine in local but when i build docker image and run it from docker it was not generating the embedding.

I don’t see any error in docker logs as well, it just simply fails without any error. I have shared the code down below

 model_name = "sentence-transformers/all-MiniLM-L6-v2"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name)

    # tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
    # model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

    inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to("cpu")
    # with torch.no_grad():
    embeddings = model(**inputs).last_hidden_state.mean(dim=1)  # Average pooling

I have tried giving privileged access to the docker image, allocating max resources and still fails to run this in docker. I am completely skeptical of why this not working.

Can any one of you please help me in this.

Thanks

1 Like

I will give you some tips for your debugging.

1. Verify Docker Configuration

  • Python Environment: Ensure the Python version inside the container matches the one used locally.
  • Installed Dependencies: Confirm that all required dependencies (like transformers, torch, etc.) are installed in the Docker environment. Use a requirements.txt file to match your local setup.
  • Device Compatibility: If running on a GPU, ensure CUDA drivers and nvidia-container-runtime are configured properly. For CPU-only, confirm no GPU-specific configurations are interfering.

2. Code Adjustments

  • Device Specification: In Docker, if you’re using a CPU-only environment, explicitly set the device to cpu. Example:
    device = "cpu"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name).to(device)
    inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to(device)
    embeddings = model(**inputs).last_hidden_state.mean(dim=1)
    
  • Model Loading: If the model is not loading properly, try forcing the download of pre-trained weights:
    tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="/tmp/model_cache")
    model = AutoModel.from_pretrained(model_name, cache_dir="/tmp/model_cache")
    
    Add cache_dir to prevent issues with missing files inside the container.

3. Check for Silent Failures

  • Wrap your embedding code in a try-except block to catch potential silent errors:
    try:
        inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to("cpu")
        embeddings = model(**inputs).last_hidden_state.mean(dim=1)
        print(embeddings)
    except Exception as e:
        print(f"Error: {e}")
    

4. Ensure Docker Permissions

  • Verify file permissions inside the container. If the model or data files can’t be accessed, it could cause silent failures. Use:
    docker exec -it <container_id> ls -l /path/to/model/cache
    

Hope this help!

1 Like