Hi there! Fairly new to this space so bear with me…
I’m trying to containerize a model called CLAP so that I can use an API (FastAPI) to be able to return embeddings from a text query. CLAP uses the HuggingFace Transformers from_pretrained("roberta-base")
RoBERTa model “under the hood” for its text embeddings.
I was able to get CLAP running in my FastAPI application locally without many problems. However when I containerize the application, it seems to fail when passing the input_ids/attention_mask
(tokenization stuff) to the RoBERTa model
which should return my embeddings.
There are no internal errors, but my API returns a 500 empty response instead of my embeddings (like it does locally).
I’m guessing this has to be somewhat related to memory usage in the container? But I am not presented with any OOM notifications. When plotting plotting over time, I see my memory spike to 8gb and then drop immediately upon failure which is questionable.
Are there any example implementations of running RoBERTa in a Docker container?
Here is my Dockerfile
FROM python:3.11.6
WORKDIR /workspace
COPY ./services/clap/requirements.txt ./requirements.txt
RUN pip install --no-cache-dir --upgrade -r ./requirements.txt
COPY ./services/clap/app ./app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--reload"]
I’ve also tried mounting my CLAP checkpoint and Transformer cache to a volume instead of storing in memory:
clap_api:
build:
context: .
dockerfile: ./services/clap/Dockerfile
ports:
- 8002:8000
restart: always
volumes:
- ./services/clap/app:/workspace/app
- ./services/clap/clap-data:/clap-data # where my CLAP checkpoint is stored
- ./services/clap/cache:/root/.cache/huggingface/hub # default cache location
My Fast API route is simple, it just does the following (using the CLAP library)
@app.post("/text-embedding")
def post_text_embedding(body: TextEmbeddingModel):
model = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-base')
model.load_ckpt(get_laion_clap_ckpt())
embeddings = model.get_text_embedding(body.queries)
return {"embeddings": embeddings.tolist()}
The line within the CLAP repo that utilizes RoBERTa is the following, where self.text_branch === RobertaModel.from_pretrained('roberta-base')
. You can see the definition of that variable a few lines higher.
if I comment out the content within the elif
statement, the API returns a response, of course without the text embeddings but I’ve narrowed it down to that line, where RoBERTa actually needs to run! So I figured I could ask here since it is Transformers/RoBERTa related.
I am so puzzled as to why this stops working in Docker and works locally…usually it’s the other way around! haha.
What other information can I help provide to be able to debug this, more than happy to promptly send over.