Sagemaker not being able to download EleutherAI/gpt-j-6B model from the Hugging Face Hub on start up

Trying to create a basic inference on Sagemaker, but can see that the model downloads 30 -40% and then the download restarts again and the loop keeps on happening . And after 20 -30 mins it just fails.

Tried with different instances as well, but still the same issue persists.

Any help would be really appreciated.

Here is he exact code,

hub = {
‘HF_MODEL_ID’:‘EleutherAI/gpt-j-6B’,
‘HF_TASK’:‘text-generation’
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
transformers_version=‘4.6.1’,
pytorch_version=‘1.7.1’,
py_version=‘py36’,
env=hub,
role=role,
)

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type=‘ml.m5.4xlarge’ # ec2 instance type
)

predictor.predict({
‘inputs’: "Can you please let us know more details about your "
})

Pinging @philschmid here :slight_smile:

Hey @amangrk,

Sorry for missing your post.
Sadly, only models < 10GB are supported with direct loading from the HUB via the configuration. In addition to this, I can tell you that when you would move GPT-J 6B to s3 and then try to deploy SageMaker would timeout, expect you would go with the P4 instances but they are very expensive.

I already shared this with the AWS Team and they are looking into it.

Any updates on this @philschmid? Running into the same issue with bigscience/tp00 model.

We created an example on how to deploy GPT-J 6B maybe it can help you. Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker
We are also working on new ways of model parallelism which might make things easier.