GGUF BYOC Deployment with AWS SageMaker: [Errno28] No space left on device

I am trying to deploy deepseek quantized models on AWS SageMaker through this guide that uses the Bring Your Own Container (BYOC) approach: GitHub - aws-samples/deploy-gguf-model-to-sagemaker.

Error: [Errno28] No space left on device
I am getting an Errno28 for not having enough space when loading the model (through update_model(bucket, key) function with s3 downloading the model with MODELPATH parameter) during endpoint deployment.

The deepseek model i downloaded and have stored in s3 is: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF, using the Q6_K_L quant type, saved as gguf format.

The only change I made to the sample code was referencing my deepseek model on S3 which stores the gguf model artifact. This is done by changing the MODELPATH variable to reference the s3 artifact. I also used a different compute from the sample - ml.g4dn.2xl for my endpoint.

This method of deployment works fine for smaller models - like the one in the sample, but seem to fail when i reference a much larger model.

Maybe I’m missing out on something with regards to how instance storage works.

Could someone guide me on how I could potentially resolve this issue?

df -h:

1 Like

It seems that the same error occurs in SageMaker due to inode exhaustion as well as disk space exhaustion.

However, the fact that it only occurs with large models suggests that either Llama.cpp is using too much RAM and causing disk swapping, or there may be something wrong with the cache when downloading models or during inference. If it’s the cache, it’s possible that you can specify the location and maximum size using environment variables.

It’s also worth noting that DeepSeek R1 is a relatively new model, but if smaller ones are working, it’s probably not the architecture…

1 Like