Using multiple GPUs for zero-shot-classification's pipeline with bart-large-mnli model

I am using facebook’s bart-large-mnli for zero-shot-classification. I have 4 “Nvidia Tesla V100-PCIE-16GB” GPUs available in my environment. I have around 500K different texts in a pandas dataframe, I would like to pass to get predictions for some classes. I am currently using pandas apply and each row/text takes 1.5 second to process and I see 27% usage using 1966MiB from 16384MiB available and the remaining 3 GPUs are not being used at all. How do I pass multiple texts at a time to efficiently process all of my data.