Transformer model works locally but not in Docker container

Hi there! Fairly new to this space so bear with me…

I’m trying to containerize a model called CLAP so that I can use an API (FastAPI) to be able to return embeddings from a text query. CLAP uses the HuggingFace Transformers from_pretrained("roberta-base") RoBERTa model “under the hood” for its text embeddings.

I was able to get CLAP running in my FastAPI application locally without many problems. However when I containerize the application, it seems to fail when passing the input_ids/attention_mask (tokenization stuff) to the RoBERTa model which should return my embeddings.

There are no internal errors, but my API returns a 500 empty response instead of my embeddings (like it does locally).

I’m guessing this has to be somewhat related to memory usage in the container? But I am not presented with any OOM notifications. When plotting plotting over time, I see my memory spike to 8gb and then drop immediately upon failure which is questionable.

Are there any example implementations of running RoBERTa in a Docker container?

Here is my Dockerfile

FROM python:3.11.6

WORKDIR /workspace

COPY ./services/clap/requirements.txt ./requirements.txt

RUN pip install --no-cache-dir --upgrade -r ./requirements.txt

COPY ./services/clap/app ./app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--reload"]

I’ve also tried mounting my CLAP checkpoint and Transformer cache to a volume instead of storing in memory:

clap_api:
    build:
        context: .
        dockerfile: ./services/clap/Dockerfile
    ports:
        - 8002:8000
    restart: always
    volumes:
        - ./services/clap/app:/workspace/app
        - ./services/clap/clap-data:/clap-data # where my CLAP checkpoint is stored
        - ./services/clap/cache:/root/.cache/huggingface/hub # default cache location

My Fast API route is simple, it just does the following (using the CLAP library)

@app.post("/text-embedding")
def post_text_embedding(body: TextEmbeddingModel):

    model = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-base')
    model.load_ckpt(get_laion_clap_ckpt())

    embeddings = model.get_text_embedding(body.queries)

    return {"embeddings": embeddings.tolist()}

The line within the CLAP repo that utilizes RoBERTa is the following, where self.text_branch === RobertaModel.from_pretrained('roberta-base'). You can see the definition of that variable a few lines higher.

if I comment out the content within the elif statement, the API returns a response, of course without the text embeddings but I’ve narrowed it down to that line, where RoBERTa actually needs to run! So I figured I could ask here since it is Transformers/RoBERTa related.

I am so puzzled as to why this stops working in Docker and works locally…usually it’s the other way around! haha.

What other information can I help provide to be able to debug this, more than happy to promptly send over.

To make matters more confusing, on application load, I have an iterator that prints out all the parameters that have been loaded. When I attempt to load in the docker environment, it only prints out about 2/3 of the parameters. I then make a change in my code, which triggers a reload in my uvicorn application. AND THEN it prints out the remaining parameters as it is restarting the server.

…it’s like I’m hitting some sort of memory limit or something within Docker preventing the iterator from continuing? I cannot replicate this locally, only when it’s in a container.

Any ideas? Here’s a video replicating this problem

Some updates, still would really appreciate some help here if anyone has time? :crossed_fingers:

I fixed one issue with the output not being fully printed by adding ENV PYTHONUNBUFFERED=1 to my Dockerfile.

However the original issue still persists. I’ve discovered that I can’t run this in Docker on my M2 MacBook, but my Intel can. Here’s the following matrix

On my M2 MacBook:

Local: works
Docker: FAILS

On my Intel MacBook:

Local: works
Docker: works

Anyone have any idea why? Here’s a GitHub repository that you can replicate this problem with

GitHub: GitHub - uncvrd/clap-mre

Again, thanks to anyone who can provide guidance here!