No space left on device when trying to run batch inference - HF not using EBS storage?

MaximusDecimusMeridi · February 3, 2022, 7:29am

I am trying to restore a model from the s3 artifacts, and then use it for batch inference. I use the following code:


from sagemaker.huggingface import HuggingFace
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker import get_execution_role
import sagemaker


huggingface_model = HuggingFaceModel(
                model_data='s3://path/output/model.tar.gz',
                name = 'model-name',
                entry_point = 'inference.py',
                role=get_execution_role(), # iam role with permissions to create an Endpoint
                transformers_version="4.6", # transformers version used
                pytorch_version="1.7", # pytorch version used
                py_version='py36', # python version used
                    )

huggingface_model.transformer(
                instance_count=1,
                instance_type='ml.g4dn.xlarge',#'ml.p3.2xlarge',
                strategy='SingleRecord',
                output_path = 's3://kj-temp/batch-size-issues'
            )

It looks like the transformer method downloads and unpacks the model to the /tmp folder which has a limit of 16G and does not use the mounted EBS volume. Is there a way to force it to use the EBS storage to avoid storage issues?

When I look in the tmp folder while the transformer is loading I see the following files

sh-4.2$ du -hs * | sort -rh | head -5
2.9G    model
2.7G    temp-model.tar.gz
2.7G    tar_file

It looks like there is some duplication going on? If I check as the folder builds, it looks like it goes in the order of tar_file, model and temp-model.tar.gz last. So it downloads the tarfile, unpacks it and then downloads the zipped tarfile again? Either way, I believe I need to store it under /home/ec2-user/SageMaker to use the extra storage - is there a way to do this? I’m trying to load a few models and kick off batch jobs, and they are all trying to write to /tmp at the same time…or is there a way to create the model without having to download the artifacts? Like a boto3 call that doesn’t need to do anything locally?

philschmid · February 3, 2022, 7:52am

Hello @MaximusDecimusMeridi,

When using entry_point in your HuggingFaceModel the Sagemaker-sdk is downloading your model.tar.gz from S3 unpacking it on the machine where you called .transofmer and then adds your inference.py and pushes to S3 again.
To avoid this you can either manually download the model.tar.gz and add your inference.py in it under code/ Documentation or if you are not using any specific pre-/postprocessing, provide an env HF_TASK with the fine-tuning task, e.g. text-classification

MaximusDecimusMeridi · February 3, 2022, 12:26pm

@philschmid thanks! I’ll give this a try and confirm

Topic		Replies	Views
"no space left on device" when downloading a large model for the Sagemaker training job Amazon SageMaker	4	4893	July 18, 2024
SageMaker OS Error No Space Left On Device while trying to train Falcon40B Amazon SageMaker	3	1289	August 24, 2023
"No space left on device" when using HuggingFace + SageMaker Amazon SageMaker	39	25391	October 10, 2023
"OS Errorr: No space left on device" when trying to load a trained model from S3 Amazon SageMaker	1	1324	December 28, 2023
ClientErro:400 when using batch transformer for inference Amazon SageMaker	11	2220	January 13, 2022

No space left on device when trying to run batch inference - HF not using EBS storage?

Related topics