Creating Sagemaker Endpoint for 2 models (Segment Anything & YOLOv8) and Invoking it

We have created a SageMaker Endpoint to deploy 2 PyTorch models and invoke them. Endpoint is created successfully and it is Real-Time. We receive error when we invoke this endpoint. The errors include Backend Worker died or Backend worker error etc. We are using “ml.g4dn.2xlarge” instance alongwith following parameters:

framework_version="2.0.1"py_version=“py310”

Some notable errors after running multiple times in our CloudWatch are:

2024-01-06T11:57:59,036 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File “/opt/conda/lib/python3.10/site-packages/ts/model_service_worker.py”, line 184, in handle_connection

2024-01-06T11:57:59,036 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.

2024-01-06T11:58:00,960 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error 2024-01-06T11:58:03,486 [ERROR] epollEventLoopGroup-5-2

org.pytorch.serve.wlm.WorkerThread - Unknown exception 2024-01-06T13:46:02,364 [ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error

We have set many logs in our inference.py file but seems like the Invoking process stops even before running the Inference file as Backend worker dies. The 2 models we are using are:

  1. sam_vit_l_0b3195.pth (Segment Anything model)

  2. yolov8n.pt