Hello all!
I have been stuck on this for weeks and am genuinely beyond confused. For some context, I was able to successfully train and finetune my CodeLlama-7B
and CodeLlama-13B
on SageMaker using the instances ml.g5.2xlarge
and ml.g5.8xlarge
and store these models in my S3 bucket. Then, I was able to effectively deploy my CodeLlama-7B
model on the SageMaker Inference Endpoint using the following code in my SageMaker Notebook Instance:
model = HuggingFaceModel(
model_data="s3://...model.tar.gz",
entry_point="inference.py",
source_dir="scripts",
... # some versioning parameters
)
predictor = model.deploy(
endpoint_name="CodeLlama-7B",
instance_type="ml.g5.2xlarge",
...
)
In this code, the model_data
points to a file (model.tar.gz
) containing my finetuned model and inference.py
is a script that holds the functions for inference (model_fn()
, predict_fn()
, etc.). Everything works beautifully when I deploy my CodeLlama-7B. However, when I replace it with my s3 file containing the CodeLlama-13B, I started receiving the OS Error: Device Out of Space
error. Several things I have tried that all still resulted in this same error:
- Scaling up the
instance_type
with a very powerful instance, such asml.p4d.24xlarge
(which is weird because I’ve seen tutorials hosting Llama 2-70B on this instance). - Adding a
volume_size
parameter in mymodel.deploy()
call with other large instances becauseml.g5.*
instances don’t support attaching extra volume storage. - Using multi-GPU and setting
device_map='auto'
when calling.from_pretrained()
. - Setting the
SM_NUM_GPUS
variable. - Scale up my Notebook Instance.
Any pointers and guidance would be very much appreciated! @philschmid , just wanted to say I’ve been following a lot of your tutorials and they have been super helpful, thank you so much for all the materials you’ve put out : )
Cheers!