Hello @joeddav , please how can i train zero-shot classification pipeline on my own dataset because i get errors in classification of some texts , so i want to train this pipeline on my data set , thank you
Thanks so much for creating this great pipeline! Iâve been experimenting with NLI for zero-shot classification and itâs really fascinating.
Could you explain a bit more the theoretical or empirical reasons for disregarding the logit for the neutral
label? I imagine that you are doing this because otherwise the model would be oversensitive for classifying too many things as neutral
(thatâs whatâs happening in my experiments). At the same time I feel a bit uneasy about simply ignoring this entire label and it sometimes leads to the model classifying something as âentailedâ too easily.
I read Yin et al. 2019 who you quote in your blog post and I noticed that they write:
We convert all datasets into binary case: âentailmentâ vs. ânon-entailmentâ, by changing the label âneutralâ (if exist in some datasets) into ânon-entailmentâ.
So for their experiments they merge âcontradictionâ and âneutralâ into the same category (ânon-entailmentâ) in the different NLI datasets even before training. Then they train their base model (BERT or which ever) on these new binary NLI datasets. This means that they then only do softmax on the two logits for entailment
or non-entailment
(if I understand correctly) and they donât have to disregard a third label because there is non.
Iâm wondering if:
- I understood this correctly?
- You think that this leads to a meaningful difference in performance?
- There are other theoretical or empirical reasons why itâs fine to simply keep and ignore the
neutral
label?
(Another, unrelated thought: When I switched from BART-mnli to roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli
I got good performance boosts in my small experiments, maybe that could also be useful for your pipeline. Itâs great that these SOTA models are freely available via the Hugging Face model hub, so thanks again )
Is there an easy way to run the inference on multiple GPUs?
Not at the moment, but hopefully in the not-too-distant-future.
Hi Joe,
Quick question!
Iâve created a Streamlit app that leverages that zero shot classification algorithm. The app iterates over dataframes to categorise each row
Iâd like to deploy on CPU instances (rather than GPUs) to save on costs (heck, these are personal projects! )
So, it may be a rather noob question, yet I was wondering if there was any way to boost speed on the CPU setting. At the moment, everything is terrbly slow when I try to use the app on my local (No GPU!) machine.
Any guidance would be much appreciated!
Thanks,
Charly
Hey @charly, hereâs a previous thread about that. The main tricks are going to be:
- Use one of these distilled models which are smaller and faster but with similar results
- Run with the ONNX Runtime. One way you can do this is with this project created by @valhalla before he joined Hugging Face
- If you have long sequences youâre classifying, you can try truncating to just part of the sequence. Thatâll give you a speedup but youâll have to evaluate how it impacts your performance.
- If you have a large # of candidate labels, try to come up with a heuristic or use a super lightweight classifier to identify the most likely candidates, and then just feed in those more likely candidates rather than all of them.
Btw if itâs public would you mind linking to your streamlit app? Itâs always fun to see the ways that people are using it
Thanks Joe!
Quick a few things to try out, thatâs exciting!
And yes, Iâll defintely share the app here as soon as it runs smoothly enough!
On a side note, Iâve tried the ONNX code you suggested in Colab and it threw the following issues when trying to import from onnx_transformers import pipeline
ModuleNotFoundError: No module named 'transformers.configuration_auto'
then if I try to downgrade to transformers==2.5.1
as suggested here, Iâve got another issue:
No module named 'transformers.convert_graph_to_onnx'
have you come across this issue before?
Thanks
Charly
Hmm not sure. Maybe @valhalla would know?
Hi @charly
could you try with transformers v3.* , havenât tested it with v4.*
BTW @joeddav, have you evaluated the distilled models on zero-shot ? Would love to know the metrics
Hey, thanks for the comment. Itâs a good question. I think in the multi-class multi-label case it does make sense to include the neutral label rather than throw it out, and youâre right that that is what Yin et al do. I ran a quick experiment on GoEmotions, which is a multi-label emotion classification corpus. When I modify the code to include the neutral label, this lowered the recall and increased the precision (as youâd expect) with a boost in the overall F1. Maybe we can add a ignore_neutral
argument with the default as True
for now with a warning that it will change to False
in the future.
I should also note that in the single-label case the pipeline ignores both neutral and contradiction and only does the softmax over the entailment dim. This might yield similar unease but so far seems to empirically work best.
I have, but only on AGâs News, which is a pretty easy topic classification dataset with only 4 classes so the results are far from conclusive. Scores are accuracy.
- facebook/bart-large-mnli: 0.6886842105263158
- valhalla/distilbart-mnli-12-1: 0.7248684210526316
- valhalla/distilbart-mnli-12-3: 0.6981578947368421
- valhalla/distilbart-mnli-12-6: 0.7277631578947369
- valhalla/distilbart-mnli-12-9: 0.689078947368421
Kinda funny how the smallest one did several points better than the original, but again itâs just one easy/small dataset so I donât think we can say much from it.
Iâm using the model deployed by a Flask API (just one instance of the model is being used by all requests), but when I send two requests at the same time, I receive an error, same as in this issue on Github https://github.com/huggingface/tokenizers/issues/537:
RuntimeError:Already borrowed
https://github.com/huggingface/tokenizers/blob/598ce61229c789465966682687fa12a90ec58074/bindings/python/py_src/tokenizers/implementations/base_tokenizer.py#L107-L123
model = pipeline('zero-shot-classification', model='joeddav/xlm-roberta-large-xnli', device=0)
model(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)
According to the issue on Github, I need to use a different tokenizer for each request but AFAIK this tokenizer is linked to the model, so how can I make it work?
Github suggestion:
I think the easiest way to fix it, for now, will be to ensure you have an instance of the tokenizer for each thread
I suspect thatâs something to do with the rust backend with our fast tokenizers. Try passing use_fast=False
when you call pipeline
.
Thanks @valhalla! Itâs worked w. transformers v3!
New issue arose, however, when trying to run the onnx transformers on my local machine (CPU, dell latitude 7490 windows 10 x64)
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type
for the following line:
File "C:\Users\Desktop\OneShotClass\ONNX_CPU.py", line 13, in <module>
classifier = pipeline("zero-shot-classification", onnx=True)
Also FYI, hereâs the code:
import pandas as pd
import numpy as np
import streamlit as st
from onnx_transformers import pipeline
classifier = pipeline("zero-shot-classification", onnx=True)
Have you guys come across that issue before?
cc @joeddav
Thanks,
Charly
could you post the full stack-trace ?
Sure, here it is:
InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type
Traceback:
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\streamlit\script_runner.py", line 332, in _run_script
exec(code, module.__dict__)
File "C:\Users\Charly\Desktop\ONNX-CPU-Zero-Shot-Class-Test\app.py", line 5, in <module>
classifier = pipeline("zero-shot-classification", onnx=True)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 1771, in pipeline
return task_class(
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 925, in __init__
super().__init__(*args, args_parser=args_parser, **kwargs)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 559, in __init__
self._warup_onnx_graph()
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 730, in _warup_onnx_graph
self._forward_onnx(model_inputs)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnx_transformers\pipelines.py", line 724, in _forward_onnx
predictions = self.onnx_model.run(None, inputs_onnx)
File "c:\users\charly\desktop\onnx-cpu-zero-shot-class-test\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 124, in run
return self._sess.run(output_names, input_feed, run_options)
Thanks,
Charly
Hi guys,
Not sure if anybody have my same problem but I donât see any difference in speed when batching versus passing a single input. It is often the opposite, inferring an example at the time is fasterâŚ
I have 10 classes and it is taking on average around 3 seconds per prediction.
I would like to batch multiple input in order to reduce the latency.
This is the code:
classifier = pipeline("zero-shot-classification", model=f'valhalla/distilbart-mnli-12-3', device=0)
classifier(batch, categories, multi_class=True)
Any thought on this?
Thank you!
Iâm not sure why youâre having that poor of latency, but one thing to keep in mind is if you feed N sequences and K classes through the pipeline, the true batch size is not N but N\cdot K. This is because every seq/label pair has to be fed through the model separately. So when you increase the size of your batch
from 1 to 10 but have K=10, youâre actually increasing the true batch size from 10 to 100, not 1 to 10. Thus the relatively insignificant speedup. Batching is happening under the hood even with N=1.
That said, 3 seconds per prediction definitely seems too slow on GPU esp. with that model. Did you ever figure out what was causing the latency?
Can DataParallel module be used for parallelizing for multi-Gpu? DataParallel â PyTorch master documentation