I am trying to restore a model from the s3 artifacts, and then use it for batch inference. I use the following code:
from sagemaker.huggingface import HuggingFace
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker import get_execution_role
import sagemaker
huggingface_model = HuggingFaceModel(
model_data='s3://path/output/model.tar.gz',
name = 'model-name',
entry_point = 'inference.py',
role=get_execution_role(), # iam role with permissions to create an Endpoint
transformers_version="4.6", # transformers version used
pytorch_version="1.7", # pytorch version used
py_version='py36', # python version used
)
huggingface_model.transformer(
instance_count=1,
instance_type='ml.g4dn.xlarge',#'ml.p3.2xlarge',
strategy='SingleRecord',
output_path = 's3://kj-temp/batch-size-issues'
)
It looks like the transformer method downloads and unpacks the model to the /tmp folder which has a limit of 16G and does not use the mounted EBS volume. Is there a way to force it to use the EBS storage to avoid storage issues?
When I look in the tmp folder while the transformer is loading I see the following files
sh-4.2$ du -hs * | sort -rh | head -5
2.9G model
2.7G temp-model.tar.gz
2.7G tar_file
It looks like there is some duplication going on? If I check as the folder builds, it looks like it goes in the order of tar_file, model and temp-model.tar.gz last. So it downloads the tarfile, unpacks it and then downloads the zipped tarfile again? Either way, I believe I need to store it under /home/ec2-user/SageMaker to use the extra storage - is there a way to do this? I’m trying to load a few models and kick off batch jobs, and they are all trying to write to /tmp at the same time…or is there a way to create the model without having to download the artifacts? Like a boto3 call that doesn’t need to do anything locally?