Custom Distilbert does not use CUDA for predition


I am using the model classifier = pipeline(“text-classification”,model=‘bhadresh-savani/distilbert-base-uncased-emotion’, return_all_scores=True) for emotion classification of my datatset of 1.2m client feedbacks. I am not training the model, I am just doing a prediction. I noticed that the model uses CPU and does not use CUDA (I have RTX5000) and the prediction takes ages to compute.
Can you explain why it is the case? Is there a way to use CUDA for this model predictions?

Thank you

1 Like

I found a solution, add parameter device=0. But I have to classify in small batches as the GPU RAM is a serious limit, even my GPU has 16 GB

1 Like

very interesting, I had the same issue. How do you change the batch size with a pipeline?

This is probably primitive but it works. I did it in the loop. It took six hours to classify 1.2 mln reviews, some quite long, on GPU with 16GB memory.

classifier = pipeline(“zero-shot-classification”,
model = “typeform/distilbert-base-uncased-mnli”,
device = 0

emo_zs_labels = list()
ind = 0
lenss = len(sentences)

while ind < lenss:
bb = ind
ee = bb+seq_len if bb+seq_len < lenss else lenss
pred = classifier(sentences[bb:ee],
candidate_labels=[“sadness”, “joy”, “love”, “anger”, “fear”, “surprise”],
temp_labels = [x[‘labels’][0] for x in pred]
ind = ee

I experimented with seq_len, caused CUDA out of memory for 1000 and 500, so I finally set it 100.

1 Like

got it, you created the batches yourself. I was curious to know whether pipeline would handle this but apparently it does not.

Just a question: what is the advantage of using a pipeline vs. using a tokenizer, loading the model, fine-tuning it and finally classifying? have you tried?

Yes. Pipeline takes just few lines of code to train or predict. But with tokenizer etc. I feel I have more control over the hyperparameters.

1 Like

interesting, i thought pipeline had little hyperparameters… how do you access/change them in your example?

That is what I am saying. If you do tokeknizer, model and trainer separately you have better control of parameters. And if you do it in native Pytorch, even more. But then instead of 5 lines of code you have few 100+

1 Like