way to make inference Zero Shot pipeline faster?

There’s some discussion in this topic that you could check out.

Here are a few things you can do:

  • Try out one of the community-uploaded distilled models on the hub (thx @valhalla) . I’ve found them to get pretty similar performance on zero shot classification and some of them are much smaller and faster. I’d start with valhalla/distilbart-mnli-12-3 (models can be specified by passing e.g. pipeline("zero-shot-classification", model="valhalla/distilbart-mnli-12-3") when you construct a model.
  • If you’re on GPU, make sure you’re passing device=0 to the pipeline factory in to utilize cuda.
  • If you’re on CPU, try running the pipeline with ONNX Runtime. You should get a boost. Here’s a project (thx again @valhalla) that lets you use HF pipelines with ORT automatically.
  • If you have a lot of candidate labels, try to get clever about passing just the most likely ones to the pipeline. Passing a large # of labels for each sentence is really going to slow you down since each sentence/label pair has to be passed to the model together. If you have 100 possible labels but you can use some kind of heuristic or simpler model to narrow it down, that will help a lot.
  • Use mixed precision. This is pretty easy if using PyTorch 1.6.
6 Likes