InternalServerException from bart model created from s3

Hi there,
I’m having a problem when invoking a pagemaker endpoint for inference from a hugging face model created from a s3 bucket.

When I create an endpoint from the hub it works works normally and I can run my inference code successfully. However, when I try to create an endpoint for the same model when stored in s3 it returns an error during inference time.

First I download the bart-large-mnli model, compress into a tar.gz and upload to my s3 bucket as described here.

git lfs install
git clone git@hf.co:{repository}

cd {repository}
tar zcvf model.tar.gz *

aws s3 cp model.tar.gz <s3://{my-s3-path}>

Then I use the following code to create an endpoint from the model stored in a s3 bucket

role = sagemaker.get_execution_role()

env = {'HF_TASK': 'summarization'}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://<my-bucket-name>/model.tar.gz",  # path to your trained SageMaker model
   role=role,                                            # IAM role with permissions to create an endpoint
   env=env,
   transformers_version="4.26",                           # Transformers version used
   pytorch_version="1.13",                                # PyTorch version used
   py_version='py39',                                    # Python version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.xlarge")

The deployment works, but when I try to invoke the model for inference using the following code:

sess = sagemaker.Session()
predictor = HuggingFacePredictor(endpoint_name=ENDPOINT, sagemaker_session=sess)

data = {
    "inputs": "I have a problem with my iphone that needs to be resolved asap!!",
    "parameters": { 
        "candidate_labels": [
                        "urgent",
                        "phone",
                        "computer",
                        "tablet",
                        "not urgent"],
        "multi_label": True
    }
}

print(predictor.predict(data=data))

I receive the following error:

botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "The following `model_kwargs` are not used by the model: [\u0027candidate_labels\u0027, \u0027multi_label\u0027] (note: typos in the generate arguments will also show up in this list)"
}

Using the same inference code pointing to the endpoint created from the hub, as below, described in the deployment on the model’s page, works normally.

hub = {
	'HF_MODEL_ID':'facebook/bart-large-mnli',
	'HF_TASK':'zero-shot-classification'
}

Thank you for any help.

This looks different to 'zero-shot-classification' and the payload you sent. You deployed a model with the summarization pipeline but are sending a payload for zero-shot-classification.