"OS Errorr: No space left on device" when trying to load a trained model from S3

Hello all!

I have been stuck on this for weeks and am genuinely beyond confused. For some context, I was able to successfully train and finetune my CodeLlama-7B and CodeLlama-13B on SageMaker using the instances ml.g5.2xlarge and ml.g5.8xlarge and store these models in my S3 bucket. Then, I was able to effectively deploy my CodeLlama-7B model on the SageMaker Inference Endpoint using the following code in my SageMaker Notebook Instance:

model = HuggingFaceModel(
    model_data="s3://...model.tar.gz",
    entry_point="inference.py",
    source_dir="scripts",
    ... # some versioning parameters
)

predictor = model.deploy(
    endpoint_name="CodeLlama-7B",
    instance_type="ml.g5.2xlarge",
    ...
)

In this code, the model_data points to a file (model.tar.gz) containing my finetuned model and inference.py is a script that holds the functions for inference (model_fn(), predict_fn(), etc.). Everything works beautifully when I deploy my CodeLlama-7B. However, when I replace it with my s3 file containing the CodeLlama-13B, I started receiving the OS Error: Device Out of Space error. Several things I have tried that all still resulted in this same error:

  1. Scaling up the instance_type with a very powerful instance, such as ml.p4d.24xlarge (which is weird because I’ve seen tutorials hosting Llama 2-70B on this instance).
  2. Adding a volume_size parameter in my model.deploy() call with other large instances because ml.g5.* instances don’t support attaching extra volume storage.
  3. Using multi-GPU and setting device_map='auto' when calling .from_pretrained().
  4. Setting the SM_NUM_GPUS variable.
  5. Scale up my Notebook Instance.

Any pointers and guidance would be very much appreciated! @philschmid , just wanted to say I’ve been following a lot of your tutorials and they have been super helpful, thank you so much for all the materials you’ve put out : )

Cheers!

1 Like

The issue has been solved!

My solution was to download the model on my local machine, unpack and include my custom inference script, and then call the .model() and .deploy()` without including the entry scripts and directories.

1 Like