Zero Shot Multi-label text classification on SageMaker

onemanarmy · September 26, 2022, 7:59am

Greetings,

I have developed a script on my computer to do some zero shot multi-label text classification using xlm-roberta.
I want to reporduce my work on sagemaker using huggingface inference toolkit and I having some trouble doing so.

On local when i do the classification i do the following:

classifier = pipeline(model="joeddav/xlm-roberta-large-xnli", task="zero-shot-classification")

predictions = classifier(sequence_to_classify, candidate_labels, multi_label=True)

On sagemaker, I configure the model from the hub and launch a batch transform job for inference but i can’t seem to find the multi_label parameter in the following:

huggingface_model = HuggingFaceModel(
        transformers_version="4.17.0",
        pytorch_version="1.10.2",
        py_version="py38",
        env=hub,
        role=event['role'])

    bt_output_key = f"s3://{event['bucket']}/{event['output_prefix']}/{event['execution_id']}"

    hf_transformer = huggingface_model.transformer(
        instance_count=event["instance_count"],
        instance_type=event["instance_type"],
        output_path=bt_output_key,
        strategy="SingleRecord",
        max_concurrent_transforms=event["concurrent_transforms"],
    )

    hf_transformer.transform(
        data=event['input_s3_path'],
        content_type="application/json",
        split_type="Line",
        wait=False
    )

I looked in the environment variables list but I think Im missing some thing.
Thank you for your help.

philschmid · September 26, 2022, 9:25am

It looks like you are “customizing” the default behavior of the pipeline with kwargs for multilabel for this you either need to provide a inference.py which is doing what you are doing locally or you need to modify your input jsonline file to includes those parameters, below is an example. You can also check out the example: notebooks/sagemaker-notebook.ipynb at main · huggingface/notebooks · GitHub

{"inputs": "VirginAmerica plus you've added commercials to the experience... tacky.",   "parameters": {"candidate_labels": ["refund", "legal", "faq"], "multi_label": True}}
{"inputs": "VirginAmerica I didn't today... Must mean I need to take another trip!", "parameters": {"candidate_labels": ["refund", "legal", "faq"], "multi_label": True}}

jxue005 · December 7, 2022, 4:08am

Hi how did you handle the multi-label issue? I’ve tried to incorporate a multi-label in the input jsonline, but the output probability is not like multi-label results. I was hoping the sum of scores won’t be equal to 1.

predictor_v1.predict({
	"inputs": "coke ",
    "parameters":{'candidate_labels' : label_list},
    "multi_label" :True
})

Output:

{'sequence': 'coke ',
 'labels': ['drink', 'accessory', 'food', 'snack', 'main dish'],
 'scores': [0.8831921815872192,
  0.06690200418233871,
  0.01697288081049919,
  0.016715381294488907,
  0.01621750369668007]}

philschmid · December 7, 2022, 8:34am

It seems that your payload is not correct the multi_label needs to be part of the parameters key.

{
	"inputs": "coke ",
    "parameters":{'candidate_labels' : label_list, "multi_label" :True}
}

jxue005 · December 7, 2022, 6:17pm

Got it. Thank you so much!

adeperio-avarni · March 5, 2023, 7:35am

I’m just trying to get this work on batch transform also. I have a lot of inputs, but also a large list of candidate labels, 385 labels to be exact. This obviously blows the file size out when trying to add the candidate labels to each input. Is there a way to specify the candidate labels just once for all inputs in a batch transform file?

philschmid · March 6, 2023, 9:02am

@adeperio-avarni instead of including the labels in your CSV you could create a custom inference.py where you store those labels, but this requires manual development.

adeperio-avarni · March 7, 2023, 4:25am

Thanks @philschmid I think that might be the way forward. I’ll try that!

Topic		Replies	Views
Sagemaker Huggingface zero shot classification with candidate_labels in predict method Amazon SageMaker	2	1878	June 16, 2022
Missing schema for sagemaker deployment of zero-shot image classification models Amazon SageMaker	2	764	August 31, 2023
Running batch transform in Sagemaker on a Huggingface model from the Hub with parameters Beginners	2	1707	February 2, 2023
Client_error_from_model Amazon SageMaker	1	428	March 15, 2023
Missing likelihood for all the classes in inference after BERT model deployed in SageMaker Beginners	1	529	June 15, 2022

Zero Shot Multi-label text classification on SageMaker

Related topics