Sagemaker not being able to download EleutherAI/gpt-j-6B model from the Hugging Face Hub on start up

Trying to create a basic inference on Sagemaker, but can see that the model downloads 30 -40% and then the download restarts again and the loop keeps on happening . And after 20 -30 mins it just fails.

Tried with different instances as well, but still the same issue persists.

Any help would be really appreciated.

Here is he exact code,

hub = {

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(

deploy model to SageMaker Inference

predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type=‘ml.m5.4xlarge’ # ec2 instance type

‘inputs’: "Can you please let us know more details about your "

Pinging @philschmid here :slight_smile:

Hey @amangrk,

Sorry for missing your post.
Sadly, only models < 10GB are supported with direct loading from the HUB via the configuration. In addition to this, I can tell you that when you would move GPT-J 6B to s3 and then try to deploy SageMaker would timeout, expect you would go with the P4 instances but they are very expensive.

I already shared this with the AWS Team and they are looking into it.

Any updates on this @philschmid? Running into the same issue with bigscience/tp00 model.

We created an example on how to deploy GPT-J 6B maybe it can help you. Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker
We are also working on new ways of model parallelism which might make things easier.