No space left on device when trying to run batch inference - HF not using EBS storage?

I am trying to restore a model from the s3 artifacts, and then use it for batch inference. I use the following code:


from sagemaker.huggingface import HuggingFace
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker import get_execution_role
import sagemaker


huggingface_model = HuggingFaceModel(
                model_data='s3://path/output/model.tar.gz',
                name = 'model-name',
                entry_point = 'inference.py',
                role=get_execution_role(), # iam role with permissions to create an Endpoint
                transformers_version="4.6", # transformers version used
                pytorch_version="1.7", # pytorch version used
                py_version='py36', # python version used
                    )

huggingface_model.transformer(
                instance_count=1,
                instance_type='ml.g4dn.xlarge',#'ml.p3.2xlarge',
                strategy='SingleRecord',
                output_path = 's3://kj-temp/batch-size-issues'
            )

It looks like the transformer method downloads and unpacks the model to the /tmp folder which has a limit of 16G and does not use the mounted EBS volume. Is there a way to force it to use the EBS storage to avoid storage issues?

When I look in the tmp folder while the transformer is loading I see the following files

sh-4.2$ du -hs * | sort -rh | head -5
2.9G    model
2.7G    temp-model.tar.gz
2.7G    tar_file

It looks like there is some duplication going on? If I check as the folder builds, it looks like it goes in the order of tar_file, model and temp-model.tar.gz last. So it downloads the tarfile, unpacks it and then downloads the zipped tarfile again? Either way, I believe I need to store it under /home/ec2-user/SageMaker to use the extra storage - is there a way to do this? I’m trying to load a few models and kick off batch jobs, and they are all trying to write to /tmp at the same time…or is there a way to create the model without having to download the artifacts? Like a boto3 call that doesn’t need to do anything locally?

Hello @MaximusDecimusMeridi,

When using entry_point in your HuggingFaceModel the Sagemaker-sdk is downloading your model.tar.gz from S3 unpacking it on the machine where you called .transofmer and then adds your inference.py and pushes to S3 again.
To avoid this you can either manually download the model.tar.gz and add your inference.py in it under code/ Documentation or if you are not using any specific pre-/postprocessing, provide an env HF_TASK with the fine-tuning task, e.g. text-classification

1 Like

@philschmid thanks! I’ll give this a try and confirm