Error 403 when downloading model for Sagemaker batch inference

I am creating a batch job with the code below. However it fails immediately with 403 forbidden client error. My cloudwatch has the following output (full traceback below)

This is an experimental beta features, which allows downloading model from the 
Hugging Face Hub on start up. It loads the model defined in the env var `HF_MODEL_ID'

.

immediately followed by:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/sentence-transformers/all-mpnet-base-v2

after which the batch job fails. Deploying to an endpoint is working fine.

The full code for batch job:

from sagemaker.huggingface import HuggingFaceModel
# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'sentence-transformers/all-mpnet-base-v2',
  'HF_TASK':'feature-extraction'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.6',
    pytorch_version='1.7',
    py_version='py36',
    env=hub,
    role=role,
)



batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    output_path='s3://kj-temp/hf/out', # we are using the same s3 path to save the output with the input
    strategy='SingleRecord'
    )

# starts batch transform job and uses s3 data as input
batch_job.transform(
    data=test_input,
    content_type='application/json',    
    split_type='Line',
    
wait = False)

and the full traceback:


Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module>
    serving.main()
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main
    _start_mms()
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms
    mms_model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 75, in start_model_server
    use_auth_token=HF_API_TOKEN,
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 154, in _load_model_from_hub
    model_info = _api.model_info(repo_id=model_id, revision=revision, token=use_auth_token)
  File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hf_api.py", line 155, in model_info
    r.raise_for_status()
  File "/opt/conda/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

Can you please retry, there was an issue with loading Sentence Transformers from the Hub.

Thanks for the followup. Still the same however… I followed the same steps as before:

Errors with the following message

This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var HF_MODEL_ID

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/cardiffnlp/twitter-roberta-base-sentiment

@kjackson for me it worked. Exactly how you described it. I checked out the git and run the notebook as it is without any changes from top to bottom without errors. I used the conda_pytorch_p36 kernel.

:exploding_head: I have no idea. Just started a ml.g4dn.xlarge instance, and followed the above steps using conda_pytorch_p36 kernel. Cloned and excuted. 403 Client error.

Could I be getting some sort of request limit block? The weird part is when I look in the endpoint logs it looks like it is executing the same download command and it works (i.e. the same beta warning appears but then the download executes fine).

Could there be some sort of permission set on the aws service level which prevents it from reaching the url? Though I believe 403 is a server side error so shouldn’t be something on my end?

I’ll keep trying, but I hope you’ll bear with me as I troubleshoot the approaches. We have lots of use cases for these models, but I’m getting errors everywhere I try, so you’ll see a few posts from me coming through :stuck_out_tongue:

Hey @kjackson,

We recorded a video on how to run batch transform maybe this helps you get started easier: https://www.youtube.com/watch?v=lnTixz0tUBg

When you look at the logs you should have 2 log streams, see below. Could you please share as much information as possible? The error should normally provide information why it failed.

Could there be some sort of permission set on the aws service level which prevents it from reaching the url? Though I believe 403 is a server side error so shouldn’t be something on my end?

Which IAM role do you use to run the batch transform job?

Thanks for the video. I’ve been trying to follow a similar example, but modify it to get batch embeddings. I’ve included the code that produces the error for batch inference for me at the bottom. I’m using a p3.2xlarge SageMaker notebook instance with a conda_pytorch_p36 kernel.

For the role I am using the default sagemaker role. I am not sure to confirm exactly what IAM privileges are being invoked when making that http request, but happy to check if you know where I can look.

I’ve also tried in a different aws account with same results.

If I let the job run all the way to completion (takes 36 mins due to retries) I only see one log,

This is the output

This is an experimental beta features, which allows downloading model from the Hugging Face Hub on start up. It loads the model defined in the env var HF_MODEL_ID

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/models/facebook/bart-large-mnli


Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module>
    serving.main()
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 34, in main
    _start_mms()
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/serving.py", line 30, in _start_mms
    mms_model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/mms_model_server.py", line 75, in start_model_server
    use_auth_token=HF_API_TOKEN,
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_huggingface_inference_toolkit/transformers_utils.py", line 154, in _load_model_from_hub
    model_info = _api.model_info(repo_id=model_id, revision=revision, token=use_auth_token)
  File "/opt/conda/lib/python3.6/site-packages/huggingface_hub/hf_api.py", line 155, in model_info
    r.raise_for_status()
  File "/opt/conda/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

.
.
.
Full Code below using a p3.2xlarge SageMaker notebook instance with a conda_pytorch_p36 kernel. Thanks!

!pip install "sagemaker>=2.48.0" "transformers==4.6.1" "datasets[s3]==1.6.2" --upgrade


import sagemaker

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

# Download some training data

!wget https://github.com/saurabh3949/Text-Classification-Datasets/raw/master/dbpedia_csv.tar.gz
!tar -xzvf dbpedia_csv.tar.gz

import pandas as pd
import json
import os

# Write small train and test files

df = pd.read_csv('dbpedia_csv/train.csv', header = None)

# write as small train input file
with open('train_text.json', 'w') as outfile:
    for desc in df.iloc[:10000, 2]:
        json.dump({"inputs": desc}, outfile)
        outfile.write('\n')
        
with open('test_text.json', 'w') as outfile:
    for desc in df.iloc[10000:15000, 2]:
        json.dump({"inputs": desc}, outfile)
        outfile.write('\n')

from sagemaker.s3 import S3Uploader

s3_prefix = 'batch-data'

training_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/train'
test_input_path = f's3://{sess.default_bucket()}/{s3_prefix}/test'


# upload datasets
train_remote = S3Uploader.upload('train_text.json',training_input_path)
test_remote = S3Uploader.upload('test_text.json',test_input_path)

print(f"train dataset uploaded to: \n{train_remote}\n{test_remote}")

from sagemaker.huggingface import HuggingFaceModel


model = 'facebook/bart-large-mnli'
task = 'feature-extraction'

hub = {
  'HF_MODEL_ID':model,
  'HF_TASK':task
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.6',
    pytorch_version='1.7',
    py_version='py36',
    env=hub,
    role=role
)


instance = "ml.p3.2xlarge"
name = f'hf-{model[:20].replace("/", "")}--{task[:25]}--{instance.split(".")[-1]}'



batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type=instance,
    output_path=f's3://{sess.default_bucket()}/{s3_prefix}/out', # we are using the same s3 path to save the output with the input
    strategy='SingleRecord'
    )

# starts batch transform job and uses s3 data as input
batch_job.transform(
    data=test_remote,
    content_type='application/json',    
    split_type='Line',
    
wait = False)

@kjackson also been facing this issue for a couple days now. tried everything in this thread and other related ones. Have you found a solution yet?

@bobbydylan could you share how you are creating you batch transform job + the error you see?

@philschmid Thank you for checking in. I never figured out why the batch transform Hub Model configuration was causing problems but I was able to get around this issue by downloading the model directly, compressing it, and then uploading it to S3

Maybe the versions of PyTorch and Transformers that the expedited batch transform Hub Model configuration were just incompatible with the versions that my model had been trained using. I’m not sure though and still a bit confused since I don’t this wouldn’t explain why I couldn’t get it to work for the model used in your example

@kjackson code for the workaround above is below in case it’s helpful

  1. get model data
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
MODEL = 'xxxxxxxxxx/xxxxxxxxxxxx'
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL)
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model.save_pretrained('model_token')
tokenizer.save_pretrained('model_token')
  1. compress and save to new folder in notebook directory
!cd model_token && tar zcvf model.tar.gz * 
!mv model_token/model.tar.gz ./model.tar.gz
  1. upload compressed model to session s3 bucket
import sagemaker
from sagemaker.s3 import S3Uploader,s3_path_join

# get the s3 bucket
sess = sagemaker.Session()
role = sagemaker.get_execution_role()
sagemaker_session_bucket = sess.default_bucket()
# uploads a given file to S3.
upload_path = s3_path_join("s3://",sagemaker_session_bucket,"lab1_model")
print(f"Uploading Model to {upload_path}")
model_uri = S3Uploader.upload('model.tar.gz',upload_path)
print(f"Uploaded model to {model_uri}")

%store model_uri