Yes. It worked! just 44 secs for 2500 rows. Thank you!
Hi @valhalla, thanks for developing the onnx_transformers. I have tried it with zero-shot-classification
pipeline and do a benchmark between using onnx
and just using pytorch
, following the benchmark_pipelines notebook. I tried several SageMaker instances with various numbers of cores and CPU types. It seems that using an instance that has more CPU core will give more speed-up when but using an instance with more cores is more expensive and at a certain level, the price is almost the same as using a GPU.
I wonder if there are other ways to speed things up while keep the cost minimal. I found that quantization may help but it seems that onnx_transformers
doesnāt support onnx quantize
yet. Do you have plan to support it? Can you kindly give me some reference to use onnx quantize
with zero-shot-classification
pipeline (with or without using onnx_transformers
)?
Thanks in advance!
Trying to run on a large dataset using 12 labels with no success. Iāve asked the question on StackOverflow:
My concern is that I keep running out of memory using 57K sentences (read from CSV and fed to the classifier as a list). Iām assuming thereās a way to batch process this by perhaps using a dataset. Any recommendations?
UPDATE:
tried using the GPU on Colab: classifier = pipeline(āzero-shot-classificationā, device=0)
and got:
RuntimeError: CUDA out of memory. Tried to allocate 812.01 GiB (GPU 0; 15.90 GiB total capacity; 6.67 GiB already allocated; 6.94 GiB free; 8.09 GiB reserved in total by PyTorch)
Notice how it goes from 6GiB to 812?
Another question:
I ran the model using pipeline() and got great results:
while using the manual approach described at Zero-Shot Learning in Modern NLP | Joe Davison Blog using
tokenizer = AutoTokenizer.from_pretrained(āfacebook/bart-large-mnliā)
model = AutoModel.from_pretrained(āfacebook/bart-large-mnliā)
yields different (and terrible) results on the same sentence and labels:
|label: nightlife | similarity: 0.14027079939842224|
|label: arts | similarity: 0.12448159605264664|
|label: stage | similarity: 0.11398478597402573|
|label: accommodation | similarity: 0.10639678686857224|
|label: outdoors | similarity: 0.10298262536525726|
|label: chat | similarity: 0.0851324051618576|
|label: fitness | similarity: 0.0802810788154602|
|label: family | similarity: 0.07305140048265457|
|label: travel | similarity: 0.06645495444536209|
|label: food | similarity: 0.05090881139039993|
|label: sports | similarity: 0.04867491126060486|
|label: health | similarity: 0.046865712851285934|
|label: music | similarity: 0.04231047257781029|
|label: social | similarity: 0.03655364364385605|
|label: shopping | similarity: 0.03481506183743477|
|label: events | similarity: 0.034809011965990067|
|label: fashion | similarity: 0.0223409254103899|
|label: culture | similarity: 0.013726986013352871|
|label: misc | similarity: -0.01880553364753723|
Am I missing something? Iām assuming this is happening because itās not using Multi-classes.
Happy to take a look at your code if you donāt mind posting the snippet.
As for your memory errors, the current pipeline implem doesnāt do any mini-batching for you, so youāre trying to run the whole dataset through a large transformer in one pass which would require an incredible amount of memory. Weāll hopefully have auto batching with the upcoming pipelines revamp, but in the meantime just pass each sequence (or a handful of sequences) to the model in a separate call rather than passing the whole dataset as one list.
Hey people, seems like lot of you are interested in speeding up zero shot . I tried one quick experiment using no teacher bart distillation for mnli and achieved impressive scores for MNLI, very small drop in metrics.
You can try out those models and see if that give similar/good-enough accuracy for zero-shot , they are faster than bart-large-mnli.
All models are available on hub
Repo if you want to try it out yourself.
Thoughts, suggestions welcome
cc @joeddav
This is fantastic!
Also excited to post for the first time on the Hugging Face forum!
@joeddav: Iāve tried to use that Zero-shot pipeline with a dataframe and default params. While results are accurate, I canāt seem to iterate over 50 rows without Colab to crash due to lack of RAM.
Is this expected? Happy to share the code if you like
Thanks in advance!
Best,
Charly
cc: @valhalla
Hi @charly, please share code, will be happy to take a look!
Thanks @valhalla for your help, much appreciated!
I finally managed to make it faster by switching to the GPU settings
Iām now facing another issue as Iām trying to deploy that Zero-shot code to Heroku or Streamlit Sharing (Streamlitās new hosting service), but the app doesnāt work once deployed.
As specified here, Iāve added these lines in my main py file:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-mnli")
model = AutoModel.from_pretrained("facebook/bart-large-mnli")
Hereās the error message I get:
OSError: Can't load weights for 'facebook/bart-large-mnli'. Make sure that: - 'facebook/bart-large-mnli' is a correct model identifier listed on 'https://huggingface.co/models' - or 'facebook/bart-large-mnli' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.
ā¦ So Iām not sure if Iām doing this properly.
Thanks in advance
Charly
Hi @charly, are still facing this issue ?
Thanks for asking @valhalla!
Yes, I still have the issue. Let me retry this week - Iāll keep you posted!
Thatās strange. What version of transformers are you using? I believe Bart used to be a ācanonicalā model in the library before it was moved to the facebook org, so try just doing bart-large-mnli
without the facebook/
if youāre using an older version of transformers.
Btw, youāll want to use AutoModelForSequenceClassification
rather than AutoModel
so that you get the NLI output layer and not just the encoder.
Thanks @joeddav,
Iāve finally managed to get it working, although not yet in a GPU set-up
By the way, when I remove facebook in facebook/bart-large-mnli
, as follows:
Iāve got the following issue:
Thanks,
Charly
Can you describe in detail what is better to use AutoModelForSequenceClassification or AutoModel to get more true predictions from this approach?
Hi @m2rik Iām curious to see your implementation, I tried the following approach with a GPU in Google Colab: for 20,000 rows (just one sentence per row) and 15 classes, it took 56 minutes.
zsc = pipeline(task='zero-shot-classification', tokenizer=tokenizer, model=model, device=0)
batch_size = 128
sequences = df['idea'].to_list()
list_of_ideas = []
for i in range(0, len(sequences), batch_size):
list_of_ideas += zsc(sequences[i:i+batch_size], candidate_labels=candidate_labels, multi_class=True)
ā
CPU times: user 31min 20s, sys: 24min 38s, total: 55min 58s
Wall time: 55min 58s
Any help is really appreciated.
Thanks.
@nayid You may have seen this already, but Iād use the distilled valhalla/distilbart-mnli-12-3 instead of the default model if youāre trying to speed things up. You should get a good boost in speed/memory and it seems to have similar accuracy.
@joeddav thank you!!
Great speed improvement, almost half the time.
CPU times: user 18min 4s, sys: 13min 37s, total: 31min 41s
Wall time: 31min 42s
No teacher distillation is really effective!
The distillation paper is out if anyone is interested.
Hello , i want to use the pipeline on my pc with docker but the containers is KILLED , you have a solution please? @joeddav
Recreating classification_api_1 ā¦ done Attaching to classification_api_1
api_1 | /app/venv/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
api_1 | return torch._C._cuda_getDeviceCount() > 0
Downloading: 100%|āāāāāāāāāā| 908/908 [00:00<00:00, 205kB/s]
Downloading: 100%|āāāāāāāāāā| 1.63G/1.63G [10:19<00:00, 2.63MB/s]s]
api_1 | Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartModel: [āmodel.encoder.versionā, āmodel.decoder.versionā]
api_1 | - This IS expected if you are initializing BartModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
api_1 | - This IS NOT expected if you are initializing BartModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading: 100%|āāāāāāāāāā| 899k/899k [00:01<00:00, 834kB/s]
Downloading: 100%|āāāāāāāāāā| 456k/456k [00:00<00:00, 658kB/s]Killed
Looks like youāre trying to use cuda without GPU / proper cuda installation?