InternalServerException from bart model created from s3

Hi there,
I’m having a problem when invoking a pagemaker endpoint for inference from a hugging face model created from a s3 bucket.

When I create an endpoint from the hub it works works normally and I can run my inference code successfully. However, when I try to create an endpoint for the same model when stored in s3 it returns an error during inference time.

First I download the bart-large-mnli model, compress into a tar.gz and upload to my s3 bucket as described here.

git lfs install
git clone{repository}

cd {repository}
tar zcvf model.tar.gz *

aws s3 cp model.tar.gz <s3://{my-s3-path}>

Then I use the following code to create an endpoint from the model stored in a s3 bucket

role = sagemaker.get_execution_role()

env = {'HF_TASK': 'summarization'}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://<my-bucket-name>/model.tar.gz",  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   transformers_version="4.26",                           # Transformers version used
   pytorch_version="1.13",                                # PyTorch version used
   py_version='py39',                                    # Python version used

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(

The deployment works, but when I try to invoke the model for inference using the following code:

sess = sagemaker.Session()
predictor = HuggingFacePredictor(endpoint_name=ENDPOINT, sagemaker_session=sess)

data = {
    "inputs": "I have a problem with my iphone that needs to be resolved asap!!",
    "parameters": { 
        "candidate_labels": [
                        "not urgent"],
        "multi_label": True


I receive the following error:

botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "The following `model_kwargs` are not used by the model: [\u0027candidate_labels\u0027, \u0027multi_label\u0027] (note: typos in the generate arguments will also show up in this list)"

Using the same inference code pointing to the endpoint created from the hub, as below, described in the deployment on the model’s page, works normally.

hub = {

Thank you for any help.

This looks different to 'zero-shot-classification' and the payload you sent. You deployed a model with the summarization pipeline but are sending a payload for zero-shot-classification.