Hello @joeddav , please how can i train zero-shot classification pipeline on my own dataset because i get errors in classification of some texts , so i want to train this pipeline on my data set , thank you
Thanks so much for creating this great pipeline! Iāve been experimenting with NLI for zero-shot classification and itās really fascinating.
Could you explain a bit more the theoretical or empirical reasons for disregarding the logit for the neutral
label? I imagine that you are doing this because otherwise the model would be oversensitive for classifying too many things as neutral
(thatās whatās happening in my experiments). At the same time I feel a bit uneasy about simply ignoring this entire label and it sometimes leads to the model classifying something as āentailedā too easily.
I read Yin et al. 2019 who you quote in your blog post and I noticed that they write:
We convert all datasets into binary case: āentailmentā vs. ānon-entailmentā, by changing the label āneutralā (if exist in some datasets) into ānon-entailmentā.
So for their experiments they merge ācontradictionā and āneutralā into the same category (ānon-entailmentā) in the different NLI datasets even before training. Then they train their base model (BERT or which ever) on these new binary NLI datasets. This means that they then only do softmax on the two logits for entailment
or non-entailment
(if I understand correctly) and they donāt have to disregard a third label because there is non.
Iām wondering if:
- I understood this correctly?
- You think that this leads to a meaningful difference in performance?
- There are other theoretical or empirical reasons why itās fine to simply keep and ignore the
neutral
label?
(Another, unrelated thought: When I switched from BART-mnli to roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli
I got good performance boosts in my small experiments, maybe that could also be useful for your pipeline. Itās great that these SOTA models are freely available via the Hugging Face model hub, so thanks again )
Is there an easy way to run the inference on multiple GPUs?
Not at the moment, but hopefully in the not-too-distant-future.
Hi Joe,
Quick question!
Iāve created a Streamlit app that leverages that zero shot classification algorithm. The app iterates over dataframes to categorise each row
Iād like to deploy on CPU instances (rather than GPUs) to save on costs (heck, these are personal projects! )
So, it may be a rather noob question, yet I was wondering if there was any way to boost speed on the CPU setting. At the moment, everything is terrbly slow when I try to use the app on my local (No GPU!) machine.
Any guidance would be much appreciated!
Thanks,
Charly
Hey @charly, hereās a previous thread about that. The main tricks are going to be:
- Use one of these distilled models which are smaller and faster but with similar results
- Run with the ONNX Runtime. One way you can do this is with this project created by @valhalla before he joined Hugging Face
- If you have long sequences youāre classifying, you can try truncating to just part of the sequence. Thatāll give you a speedup but youāll have to evaluate how it impacts your performance.
- If you have a large # of candidate labels, try to come up with a heuristic or use a super lightweight classifier to identify the most likely candidates, and then just feed in those more likely candidates rather than all of them.
Btw if itās public would you mind linking to your streamlit app? Itās always fun to see the ways that people are using it
Thanks Joe!
Quick a few things to try out, thatās exciting!
And yes, Iāll defintely share the app here as soon as it runs smoothly enough!
On a side note, Iāve tried the ONNX code you suggested in Colab and it threw the following issues when trying to import from onnx_transformers import pipeline
ModuleNotFoundError: No module named 'transformers.configuration_auto'
then if I try to downgrade to transformers==2.5.1
as suggested here, Iāve got another issue:
No module named 'transformers.convert_graph_to_onnx'
have you come across this issue before?
Thanks
Charly
Hmm not sure. Maybe @valhalla would know?
Hi @charly
could you try with transformers v3.* , havenāt tested it with v4.*
BTW @joeddav, have you evaluated the distilled models on zero-shot ? Would love to know the metrics
Hey, thanks for the comment. Itās a good question. I think in the multi-class multi-label case it does make sense to include the neutral label rather than throw it out, and youāre right that that is what Yin et al do. I ran a quick experiment on GoEmotions, which is a multi-label emotion classification corpus. When I modify the code to include the neutral label, this lowered the recall and increased the precision (as youād expect) with a boost in the overall F1. Maybe we can add a ignore_neutral
argument with the default as True
for now with a warning that it will change to False
in the future.
I should also note that in the single-label case the pipeline ignores both neutral and contradiction and only does the softmax over the entailment dim. This might yield similar unease but so far seems to empirically work best.
I have, but only on AGās News, which is a pretty easy topic classification dataset with only 4 classes so the results are far from conclusive. Scores are accuracy.
- facebook/bart-large-mnli: 0.6886842105263158
- valhalla/distilbart-mnli-12-1: 0.7248684210526316
- valhalla/distilbart-mnli-12-3: 0.6981578947368421
- valhalla/distilbart-mnli-12-6: 0.7277631578947369
- valhalla/distilbart-mnli-12-9: 0.689078947368421
Kinda funny how the smallest one did several points better than the original, but again itās just one easy/small dataset so I donāt think we can say much from it.
Iām using the model deployed by a Flask API (just one instance of the model is being used by all requests), but when I send two requests at the same time, I receive an error, same as in this issue on Github https://github.com/huggingface/tokenizers/issues/537:
RuntimeError:Already borrowed
https://github.com/huggingface/tokenizers/blob/598ce61229c789465966682687fa12a90ec58074/bindings/python/py_src/tokenizers/implementations/base_tokenizer.py#L107-L123
model = pipeline('zero-shot-classification', model='joeddav/xlm-roberta-large-xnli', device=0)
model(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)
According to the issue on Github, I need to use a different tokenizer for each request but AFAIK this tokenizer is linked to the model, so how can I make it work?
Github suggestion:
I think the easiest way to fix it, for now, will be to ensure you have an instance of the tokenizer for each thread
I suspect thatās something to do with the rust backend with our fast tokenizers. Try passing use_fast=False
when you call pipeline
.
Thanks @valhalla! Itās worked w. transformers v3!
New issue arose, however, when trying to run the onnx transformers on my local machine (CPU, dell latitude 7490 windows 10 x64)
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type
for the following line:
File "C:\Users\Desktop\OneShotClass\ONNX_CPU.py", line 13, in <module>
classifier = pipeline("zero-shot-classification", onnx=True)
Also FYI, hereās the code:
import pandas as pd
import numpy as np
import streamlit as st
from onnx_transformers import pipeline
classifier = pipeline("zero-shot-classification", onnx=True)
Have you guys come across that issue before?
cc @joeddav
Thanks,
Charly
could you post the full stack-trace ?
Sure, here it is:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type
Traceback:
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\streamlit\script_runner.py", line 332, in _run_script
exec(code, module.__dict__)
File "C:\Users\Charly\Desktop\ONNX-CPU-Zero-Shot-Class-Test\app.py", line 5, in <module>
classifier = pipeline("zero-shot-classification", onnx=True)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 1771, in pipeline
return task_class(
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 925, in __init__
super().__init__(*args, args_parser=args_parser, **kwargs)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 559, in __init__
self._warup_onnx_graph()
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 730, in _warup_onnx_graph
self._forward_onnx(model_inputs)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 724, in _forward_onnx
predictions = self.onnx_model.run(None, inputs_onnx)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 124, in run
return self._sess.run(output_names, input_feed, run_options)
Thanks,
Charly
Hi guys,
Not sure if anybody have my same problem but I donāt see any difference in speed when batching versus passing a single input. It is often the opposite, inferring an example at the time is fasterā¦
I have 10 classes and it is taking on average around 3 seconds per prediction.
I would like to batch multiple input in order to reduce the latency.
This is the code:
classifier = pipeline("zero-shot-classification", model=f'valhalla/distilbart-mnli-12-3', device=0)
classifier(batch, categories, multi_class=True)
Any thought on this?
Thank you!
Iām not sure why youāre having that poor of latency, but one thing to keep in mind is if you feed N sequences and K classes through the pipeline, the true batch size is not N but N\cdot K. This is because every seq/label pair has to be fed through the model separately. So when you increase the size of your batch
from 1 to 10 but have K=10, youāre actually increasing the true batch size from 10 to 100, not 1 to 10. Thus the relatively insignificant speedup. Batching is happening under the hood even with N=1.
That said, 3 seconds per prediction definitely seems too slow on GPU esp. with that model. Did you ever figure out what was causing the latency?
Can DataParallel module be used for parallelizing for multi-Gpu? DataParallel ā PyTorch master documentation