The most efficient way for predictions(zero-shot classification) on huge dataset

I have a pretty large dataset dataset(2 milliom records), which consists of 2 columns:

  1. Text(up to 3-4 words, usually short)
  2. Labels for prediction(up to 3-4 words as well)

What I want to do is to apply pretrained Roberta model for zero-shot classification. Here is the way I did it:

#convert pandas to dataset:
dataset = Dataset.from_pandas(data)

#loading model:
model = AutoModelForSequenceClassification.from_pretrained('joeddav/xlm-roberta-large-xnli')
tokenizer = AutoTokenizer.from_pretrained('joeddav/xlm-roberta-large-xnli')

classifier = pipeline("zero-shot-classification", model= model,tokenizer = tokenizer ,framework = 'pt')
hypothesis_template = "Im Text geht es um {}"

#define prediction function and apply it to the dataset
def prediction(record,classifier):
    hypothesis_template = "Im Text geht es um {}"
    output = classifier(record['text'],record['label'],hypothesis_template=hypothesis_template)
    record['prediction'] = output['labels'][0]
    record['scores'] = output['scores'][0]
    return record x: prediction(x,classifier=classifier))

But I am not sure if it’s the most efficient way for inference(unfortunately it process approximately 2-3 records per second, which is too slow). Official page (Pipelines) says, that I should avoid batching if I am using CPU. But still my questions are :

  1. Is pipeline wrapper pipeline fast enough or should stick to ‘lower level’(for example: native Pytorch)?
  2. Is inference though .map considered a good practice? If not, what should be used instead?
  3. Having relative short text(maximum 5-6 words) should batching be used instead of one record at a time?