I have an endpoint that processes inference in batches (currently 100 items per request - see inference script below). I occasionally get the following error: 2022-05-19 15:25:43,484 [INFO ] W-model-4-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mms.service.PredictionException: CUDA out of me…

Getting CUDA memory error at endpoint - what are my options?

MaximusDecimusMeridi May 20, 2022, 5:46pm 6

Awesome thank you. Inferentia seems the most promising option right now, but would be very interesting to include g5 in the load testing to see how it compares.

Topic		Replies	Views
CUDA error when deploying model with custom inference Amazon SageMaker	0	308	February 21, 2024
Regarding CUDA OOM! Amazon SageMaker	0	499	February 14, 2023
CPU/Memory Utilization Too High When Running Inference on Falcon 40B Instruct Amazon SageMaker	4	1600	August 31, 2023
Impossible to use flan-t5-xxl in a batch-transform job Amazon SageMaker	3	1150	May 23, 2023
CUDA error for inference on GPU instance Amazon SageMaker	2	766	May 16, 2023

Getting CUDA memory error at endpoint - what are my options?

Related topics