Hello all!
I have been stuck on this for weeks and am genuinely beyond confused. For some context, I was able to successfully train and finetune my CodeLlama-7B and CodeLlama-13B on SageMaker using the instances ml.g5.2xlarge and ml.g5.8xlarge and store these models in my S3 bucket. Then, I was able to effectively deploy my CodeLlama-7B model on the SageMaker Inference Endpoint using the following code in my SageMaker Notebook Instance:
model = HuggingFaceModel(
model_data="s3://...model.tar.gz",
entry_point="inference.py",
source_dir="scripts",
... # some versioning parameters
)
predictor = model.deploy(
endpoint_name="CodeLlama-7B",
instance_type="ml.g5.2xlarge",
...
)
In this code, the model_data points to a file (model.tar.gz) containing my finetuned model and inference.py is a script that holds the functions for inference (model_fn(), predict_fn(), etc.). Everything works beautifully when I deploy my CodeLlama-7B. However, when I replace it with my s3 file containing the CodeLlama-13B, I started receiving the OS Error: Device Out of Space error. Several things I have tried that all still resulted in this same error:
- Scaling up the
instance_typewith a very powerful instance, such asml.p4d.24xlarge(which is weird because I’ve seen tutorials hosting Llama 2-70B on this instance). - Adding a
volume_sizeparameter in mymodel.deploy()call with other large instances becauseml.g5.*instances don’t support attaching extra volume storage. - Using multi-GPU and setting
device_map='auto'when calling.from_pretrained(). - Setting the
SM_NUM_GPUSvariable. - Scale up my Notebook Instance.
Any pointers and guidance would be very much appreciated! @philschmid , just wanted to say I’ve been following a lot of your tutorials and they have been super helpful, thank you so much for all the materials you’ve put out : )
Cheers!