Hi,
I have a container with an API implemented with FastAPI to do predictions with the model
If I send requests sequentially everything works well. When I send them in parallel with joblib I start getting CUDA errors.
To try to solve this I disabled workers in uvicorn and I added Cuda Stream to try to get this working, but i still get the issue