Hi
Can you guys give me tips how to make Zero Shot pipeline inference faster?
My current approach right now is reducing the model size/parameter
(trying to train “base model” instead of "large model)
Is there another approach?
Hi
Can you guys give me tips how to make Zero Shot pipeline inference faster?
My current approach right now is reducing the model size/parameter
(trying to train “base model” instead of "large model)
Is there another approach?
There’s some discussion in this topic that you could check out.
Here are a few things you can do:
valhalla/distilbart-mnli-12-3
(models can be specified by passing e.g. pipeline("zero-shot-classification", model="valhalla/distilbart-mnli-12-3")
when you construct a model.device=0
to the pipeline factory in to utilize cuda.thanks for the advice @joeddav and also thanks to @valhalla for amazing project about distiled model
distiled model seems interesting…will try to look for it,
and it would be great if its support more model,such as xlm-roberta
anyway
i’m trying to train/reproduce your xlm-model
but using base-model instead of large one to improve inference speed.
will try use this step maybe
using xlm-r because it support more languages
any tips that i should aware of ?
thanks again
I noticed using the zero-shot-classification pipeline that loading the model (i.e. this line: classifier = pipeline(“zero-shot-classification”, device=0)) takes about 60 seconds, but that inference afterward is quite fast. Is there a way to speed up the model/tokenizer loading process? Thanks!
the pipeline actually loads the model twice, once in the get_framework function and then again on line 3296
For now, if you want, you could just hardcode the framework as pt
and remove the call to get_framework
to load the model once.
Seems like we should add a utility function to file_utils.py
to check whether a tf/pt model file exists at a path without having to download it so that we don’t have to do this. Thoughts from @sgugger or @lysandre maybe?
I’m confused about what the question is. You can pass a local path to XyzModel.from_pretrained
and it won’t download anything.
Awesome thanks! – I made the change below and load time dropped from 61 second to 32 seconds:
classifier = pipeline(“zero-shot-classification”, device=0) ----> classifier = pipeline(“zero-shot-classification”, framework=“pt”, device=0)
@sgugger the issue is whether the model is local or not the pipeline loads it twice, which adds up significant time for big models like bart-large
Yes, so this has nothing to do with files_utils
.
I should have been clearer. The problem is that the get_framework
function in the pipelines implementation determines the framework with a try/catch, attempting to load the model in pytorch and if it fails, loading it with tensorflow. But then it just throws the model away, so the pipeline constructor has to load it again later.
What I was trying to say is: would it make sense to have a utility in file_utils.py
that can tell you whether a file exists without having to download the whole thing? In this case, it would allow us to check whether a model file exists for a particular framework (e.g. pytorch_model.bin
) without having to wait for the large file to download if it does. I imagine that could be useful in other places too, but I’m not the expert, so I thought I’d see what you thought If it wouldn’t be useful elsewhere, there’s probably an easier workaround without leaving pipelines.py
.
(sorry for the belated response)
Understood now! This function could certainly be useful, yes.