New pipeline for zero-shot text classification

Hello @joeddav , please how can i train zero-shot classification pipeline on my own dataset because i get errors in classification of some texts , so i want to train this pipeline on my data set , thank you

Thanks so much for creating this great pipeline! Iā€™ve been experimenting with NLI for zero-shot classification and itā€™s really fascinating.

Could you explain a bit more the theoretical or empirical reasons for disregarding the logit for the neutral label? I imagine that you are doing this because otherwise the model would be oversensitive for classifying too many things as neutral (thatā€™s whatā€™s happening in my experiments). At the same time I feel a bit uneasy about simply ignoring this entire label and it sometimes leads to the model classifying something as ā€˜entailedā€™ too easily.

I read Yin et al. 2019 who you quote in your blog post and I noticed that they write:

We convert all datasets into binary case: ā€œentailmentā€ vs. ā€œnon-entailmentā€, by changing the label ā€œneutralā€ (if exist in some datasets) into ā€œnon-entailmentā€.

So for their experiments they merge ā€˜contradictionā€™ and ā€˜neutralā€™ into the same category (ā€˜non-entailmentā€™) in the different NLI datasets even before training. Then they train their base model (BERT or which ever) on these new binary NLI datasets. This means that they then only do softmax on the two logits for entailment or non-entailment (if I understand correctly) and they donā€™t have to disregard a third label because there is non.

Iā€™m wondering if:

  1. I understood this correctly?
  2. You think that this leads to a meaningful difference in performance?
  3. There are other theoretical or empirical reasons why itā€™s fine to simply keep and ignore the neutral label?

(Another, unrelated thought: When I switched from BART-mnli to roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli I got good performance boosts in my small experiments, maybe that could also be useful for your pipeline. Itā€™s great that these SOTA models are freely available via the Hugging Face model hub, so thanks again :slight_smile: )

1 Like

Is there an easy way to run the inference on multiple GPUs?

Not at the moment, but hopefully in the not-too-distant-future.

1 Like

Hi Joe,

Quick question! :slight_smile:

Iā€™ve created a Streamlit app that leverages that zero shot classification algorithm. The app iterates over dataframes to categorise each row

Iā€™d like to deploy on CPU instances (rather than GPUs) to save on costs (heck, these are personal projects! :sweat_smile:)

So, it may be a rather noob question, yet I was wondering if there was any way to boost speed on the CPU setting. At the moment, everything is terrbly slow when I try to use the app on my local (No GPU!) machine.

Any guidance would be much appreciated! :pray:

Thanks,
Charly

Hey @charly, hereā€™s a previous thread about that. The main tricks are going to be:

  • Use one of these distilled models which are smaller and faster but with similar results
  • Run with the ONNX Runtime. One way you can do this is with this project created by @valhalla before he joined Hugging Face
  • If you have long sequences youā€™re classifying, you can try truncating to just part of the sequence. Thatā€™ll give you a speedup but youā€™ll have to evaluate how it impacts your performance.
  • If you have a large # of candidate labels, try to come up with a heuristic or use a super lightweight classifier to identify the most likely candidates, and then just feed in those more likely candidates rather than all of them.

Btw if itā€™s public would you mind linking to your streamlit app? Itā€™s always fun to see the ways that people are using it :blush:

1 Like

Thanks Joe!

Quick a few things to try out, thatā€™s exciting! :raised_hands:

And yes, Iā€™ll defintely share the app here as soon as it runs smoothly enough! :sweat_smile:

On a side note, Iā€™ve tried the ONNX code you suggested in Colab and it threw the following issues when trying to import from onnx_transformers import pipeline

ModuleNotFoundError: No module named 'transformers.configuration_auto'

then if I try to downgrade to transformers==2.5.1 as suggested here, Iā€™ve got another issue:

No module named 'transformers.convert_graph_to_onnx'

have you come across this issue before?

Thanks
Charly

Hmm not sure. Maybe @valhalla would know?

Hi @charly

could you try with transformers v3.* , havenā€™t tested it with v4.*

1 Like

BTW @joeddav, have you evaluated the distilled models on zero-shot ? Would love to know the metrics :slight_smile:

1 Like

Hey, thanks for the comment. Itā€™s a good question. I think in the multi-class multi-label case it does make sense to include the neutral label rather than throw it out, and youā€™re right that that is what Yin et al do. I ran a quick experiment on GoEmotions, which is a multi-label emotion classification corpus. When I modify the code to include the neutral label, this lowered the recall and increased the precision (as youā€™d expect) with a boost in the overall F1. Maybe we can add a ignore_neutral argument with the default as True for now with a warning that it will change to False in the future.

I should also note that in the single-label case the pipeline ignores both neutral and contradiction and only does the softmax over the entailment dim. This might yield similar unease but so far seems to empirically work best.

2 Likes

I have, but only on AGā€™s News, which is a pretty easy topic classification dataset with only 4 classes so the results are far from conclusive. Scores are accuracy.

  • facebook/bart-large-mnli: 0.6886842105263158
  • valhalla/distilbart-mnli-12-1: 0.7248684210526316
  • valhalla/distilbart-mnli-12-3: 0.6981578947368421
  • valhalla/distilbart-mnli-12-6: 0.7277631578947369
  • valhalla/distilbart-mnli-12-9: 0.689078947368421

Kinda funny how the smallest one did several points better than the original, but again itā€™s just one easy/small dataset so I donā€™t think we can say much from it.

1 Like

Iā€™m using the model deployed by a Flask API (just one instance of the model is being used by all requests), but when I send two requests at the same time, I receive an error, same as in this issue on Github https://github.com/huggingface/tokenizers/issues/537:

RuntimeError:Already borrowed
https://github.com/huggingface/tokenizers/blob/598ce61229c789465966682687fa12a90ec58074/bindings/python/py_src/tokenizers/implementations/base_tokenizer.py#L107-L123

model = pipeline('zero-shot-classification', model='joeddav/xlm-roberta-large-xnli', device=0)
model(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)

According to the issue on Github, I need to use a different tokenizer for each request but AFAIK this tokenizer is linked to the model, so how can I make it work?

Github suggestion:

I think the easiest way to fix it, for now, will be to ensure you have an instance of the tokenizer for each thread

I suspect thatā€™s something to do with the rust backend with our fast tokenizers. Try passing use_fast=False when you call pipeline.

1 Like

Thanks @valhalla! Itā€™s worked w. transformers v3!

New issue arose, however, when trying to run the onnx transformers on my local machine (CPU, dell latitude 7490 windows 10 x64)

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type

for the following line:

File "C:\Users\Desktop\OneShotClass\ONNX_CPU.py", line 13, in <module>
    classifier = pipeline("zero-shot-classification", onnx=True)

Also FYI, hereā€™s the code:

import pandas as pd
import numpy as np
import streamlit as st
from onnx_transformers import pipeline
classifier = pipeline("zero-shot-classification", onnx=True) 

Have you guys come across that issue before?

cc @joeddav

Thanks,
Charly

could you post the full stack-trace ?

1 Like

Sure, here it is:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type
Traceback:
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\streamlit\script_runner.py", line 332, in _run_script
    exec(code, module.__dict__)
File "C:\Users\Charly\Desktop\ONNX-CPU-Zero-Shot-Class-Test\app.py", line 5, in <module>
    classifier = pipeline("zero-shot-classification", onnx=True)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 1771, in pipeline
    return task_class(
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 925, in __init__
    super().__init__(*args, args_parser=args_parser, **kwargs)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 559, in __init__
    self._warup_onnx_graph()
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 730, in _warup_onnx_graph
    self._forward_onnx(model_inputs)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 724, in _forward_onnx
    predictions = self.onnx_model.run(None, inputs_onnx)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 124, in run
    return self._sess.run(output_names, input_feed, run_options)

Thanks,
Charly

Hi guys,
Not sure if anybody have my same problem but I donā€™t see any difference in speed when batching versus passing a single input. It is often the opposite, inferring an example at the time is fasterā€¦
I have 10 classes and it is taking on average around 3 seconds per prediction.
I would like to batch multiple input in order to reduce the latency.

This is the code:

classifier = pipeline("zero-shot-classification", model=f'valhalla/distilbart-mnli-12-3', device=0)
classifier(batch, categories, multi_class=True)

Any thought on this?
Thank you!

Iā€™m not sure why youā€™re having that poor of latency, but one thing to keep in mind is if you feed N sequences and K classes through the pipeline, the true batch size is not N but N\cdot K. This is because every seq/label pair has to be fed through the model separately. So when you increase the size of your batch from 1 to 10 but have K=10, youā€™re actually increasing the true batch size from 10 to 100, not 1 to 10. Thus the relatively insignificant speedup. Batching is happening under the hood even with N=1.

That said, 3 seconds per prediction definitely seems too slow on GPU esp. with that model. Did you ever figure out what was causing the latency?

Can DataParallel module be used for parallelizing for multi-Gpu? DataParallel ā€” PyTorch master documentation