New pipeline for zero-shot text classification

Yes. It worked! just 44 secs for 2500 rows. Thank you!

2 Likes

Hi @valhalla, thanks for developing the onnx_transformers. I have tried it with zero-shot-classification pipeline and do a benchmark between using onnx and just using pytorch, following the benchmark_pipelines notebook. I tried several SageMaker instances with various numbers of cores and CPU types. It seems that using an instance that has more CPU core will give more speed-up when but using an instance with more cores is more expensive and at a certain level, the price is almost the same as using a GPU.
I wonder if there are other ways to speed things up while keep the cost minimal. I found that quantization may help but it seems that onnx_transformers doesnā€™t support onnx quantize yet. Do you have plan to support it? Can you kindly give me some reference to use onnx quantize with zero-shot-classification pipeline (with or without using onnx_transformers)?
Thanks in advance!

Trying to run on a large dataset using 12 labels with no success. Iā€™ve asked the question on StackOverflow:

My concern is that I keep running out of memory using 57K sentences (read from CSV and fed to the classifier as a list). Iā€™m assuming thereā€™s a way to batch process this by perhaps using a dataset. Any recommendations?

UPDATE:
tried using the GPU on Colab: classifier = pipeline(ā€˜zero-shot-classificationā€™, device=0)
and got:
RuntimeError: CUDA out of memory. Tried to allocate 812.01 GiB (GPU 0; 15.90 GiB total capacity; 6.67 GiB already allocated; 6.94 GiB free; 8.09 GiB reserved in total by PyTorch)
Notice how it goes from 6GiB to 812?

1 Like

Another question:
I ran the model using pipeline() and got great results:

while using the manual approach described at Zero-Shot Learning in Modern NLP | Joe Davison Blog using
tokenizer = AutoTokenizer.from_pretrained(ā€˜facebook/bart-large-mnliā€™)
model = AutoModel.from_pretrained(ā€˜facebook/bart-large-mnliā€™)

yields different (and terrible) results on the same sentence and labels:

|label: nightlife | similarity: 0.14027079939842224|
|label: arts | similarity: 0.12448159605264664|
|label: stage | similarity: 0.11398478597402573|
|label: accommodation | similarity: 0.10639678686857224|
|label: outdoors | similarity: 0.10298262536525726|
|label: chat | similarity: 0.0851324051618576|
|label: fitness | similarity: 0.0802810788154602|
|label: family | similarity: 0.07305140048265457|
|label: travel | similarity: 0.06645495444536209|
|label: food | similarity: 0.05090881139039993|
|label: sports | similarity: 0.04867491126060486|
|label: health | similarity: 0.046865712851285934|
|label: music | similarity: 0.04231047257781029|
|label: social | similarity: 0.03655364364385605|
|label: shopping | similarity: 0.03481506183743477|
|label: events | similarity: 0.034809011965990067|
|label: fashion | similarity: 0.0223409254103899|
|label: culture | similarity: 0.013726986013352871|
|label: misc | similarity: -0.01880553364753723|

Am I missing something? Iā€™m assuming this is happening because itā€™s not using Multi-classes.

Happy to take a look at your code if you donā€™t mind posting the snippet.

As for your memory errors, the current pipeline implem doesnā€™t do any mini-batching for you, so youā€™re trying to run the whole dataset through a large transformer in one pass which would require an incredible amount of memory. Weā€™ll hopefully have auto batching with the upcoming pipelines revamp, but in the meantime just pass each sequence (or a handful of sequences) to the model in a separate call rather than passing the whole dataset as one list.

Hey people, seems like lot of you are interested in speeding up zero shot . I tried one quick experiment using no teacher bart distillation for mnli and achieved impressive scores for MNLI, very small drop in metrics.

You can try out those models and see if that give similar/good-enough accuracy for zero-shot , they are faster than bart-large-mnli.

All models are available on hub
Repo if you want to try it out yourself.

Thoughts, suggestions welcome :hugs:

cc @joeddav

3 Likes

This is fantastic! :raised_hands:

Also excited to post for the first time on the Hugging Face forum! :hugs:

@joeddav: Iā€™ve tried to use that Zero-shot pipeline with a dataframe and default params. While results are accurate, I canā€™t seem to iterate over 50 rows without Colab to crash due to lack of RAM.

Is this expected? Happy to share the code if you like :slight_smile:

Thanks in advance!

Best,
Charly

cc: @valhalla

Hi @charly, please share code, will be happy to take a look!

1 Like

Thanks @valhalla for your help, much appreciated! :slight_smile:

I finally managed to make it faster by switching to the GPU settings :slight_smile:

Iā€™m now facing another issue as Iā€™m trying to deploy that Zero-shot code to Heroku or Streamlit Sharing (Streamlitā€™s new hosting service), but the app doesnā€™t work once deployed.

As specified here, Iā€™ve added these lines in my main py file:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")

model = AutoModel.from_pretrained("facebook/bart-large-mnli")

Hereā€™s the error message I get:

OSError: Can't load weights for 'facebook/bart-large-mnli'. Make sure that: - 'facebook/bart-large-mnli' is a correct model identifier listed on 'https://huggingface.co/models' - or 'facebook/bart-large-mnli' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

ā€¦ So Iā€™m not sure if Iā€™m doing this properly.

Thanks in advance
Charly

Hi @charly, are still facing this issue ?

1 Like

Thanks for asking @valhalla!

Yes, I still have the issue. Let me retry this week - Iā€™ll keep you posted! :slight_smile:

Thatā€™s strange. What version of transformers are you using? I believe Bart used to be a ā€œcanonicalā€ model in the library before it was moved to the facebook org, so try just doing bart-large-mnli without the facebook/ if youā€™re using an older version of transformers.

Btw, youā€™ll want to use AutoModelForSequenceClassification rather than AutoModel so that you get the NLI output layer and not just the encoder.

1 Like

Thanks @joeddav,

Iā€™ve finally managed to get it working, although not yet in a GPU set-up :slight_smile:

By the way, when I remove facebook in facebook/bart-large-mnli, as follows:

Iā€™ve got the following issue:

Thanks,
Charly

1 Like

Can you describe in detail what is better to use AutoModelForSequenceClassification or AutoModel to get more true predictions from this approach?

Hi @m2rik Iā€™m curious to see your implementation, I tried the following approach with a GPU in Google Colab: for 20,000 rows (just one sentence per row) and 15 classes, it took 56 minutes.

zsc = pipeline(task='zero-shot-classification', tokenizer=tokenizer, model=model, device=0)

batch_size = 128 
sequences = df['idea'].to_list()
list_of_ideas = []
for i in range(0, len(sequences), batch_size):
    list_of_ideas += zsc(sequences[i:i+batch_size], candidate_labels=candidate_labels, multi_class=True)

ā€“
CPU times: user 31min 20s, sys: 24min 38s, total: 55min 58s
Wall time: 55min 58s

Any help is really appreciated.
Thanks.

@nayid You may have seen this already, but Iā€™d use the distilled valhalla/distilbart-mnli-12-3 instead of the default model if youā€™re trying to speed things up. You should get a good boost in speed/memory and it seems to have similar accuracy.

1 Like

@joeddav thank you!!
Great speed improvement, almost half the time.

CPU times: user 18min 4s, sys: 13min 37s, total: 31min 41s
Wall time: 31min 42s
:+1: :+1:

2 Likes

No teacher distillation is really effective!
The distillation paper is out if anyone is interested.

Hello , i want to use the pipeline on my pc with docker but the containers is KILLED , you have a solution please? @joeddav

Recreating classification_api_1 ā€¦ done Attaching to classification_api_1
api_1 | /app/venv/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
api_1 | return torch._C._cuda_getDeviceCount() > 0
Downloading: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 908/908 [00:00<00:00, 205kB/s]
Downloading: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 1.63G/1.63G [10:19<00:00, 2.63MB/s]s]
api_1 | Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartModel: [ā€˜model.encoder.versionā€™, ā€˜model.decoder.versionā€™]
api_1 | - This IS expected if you are initializing BartModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
api_1 | - This IS NOT expected if you are initializing BartModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 899k/899k [00:01<00:00, 834kB/s]
Downloading: 100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 456k/456k [00:00<00:00, 658kB/s]Killed

Looks like youā€™re trying to use cuda without GPU / proper cuda installation?