NLP Pretrained model model doesn’t use GPU when making inference

yashugupta786 · October 7, 2020, 6:01am

I am using Marian MT Pretrained model for Inference for machine Translation task integrated with a flask Service . I am running the Model on Cuda enabled device .While inferencing the model not using the GPU ,it is using the CPU only .I don’t want to use the cpu for inference as it is taking very long time for processing the request. Even if i am passing 1 sentence it is taking very long . Please help on this . Below is the code snippet and model i am using

model_name = ‘Helsinki-NLP/opus-mt-ROMANCE-en’
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

I have downloaded the pytorch model.bin and other tokenizer files from s3 environment and saved on my local …Please help on this how i can put the things on GPU for faster inference

Karthik12 · October 7, 2020, 6:09am

Have you tried model.to(‘cuda’), to make the model use the GPU?

yashugupta786 · October 7, 2020, 6:14am

Hi Karthik

Thanks for replying yes i have used please find below the snippet and please correct me where i am doing wrong

torch_device = ‘cuda’ if torch.cuda.is_available() else 'cpu’
print(torch_device)

model_name = ‘Helsinki-NLP/opus-mt-ROMANCE-en’
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name).to(torch_device)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]

Thanks in advance

Karthik12 · October 7, 2020, 6:39am

You have the model on GPU but how about the tokenizer?

Change this line:
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))

To:
translated = model.generate(**tokenizer.prepare_translation_batch(src_text).to(‘cuda’))

yashugupta786 · October 7, 2020, 6:48am

Sure kartik thanks
i will check also should i also change this below line

tokenizer = MarianTokenizer.from_pretrained(model_name).to(‘cuda’)
or translated = model.generate(**tokenizer.prepare_translation_batch(src_text).to(‘cuda’))
only to the above one which you told

Karthik12 · October 7, 2020, 6:51am

tokenizer = MarianTokenizer.from_pretrained(model_name).to(‘cuda’) - Do you get an error here, like, "‘MarianTokenizer’ object has no attribute ‘to’? If so, you can give as I have mentioned.
Try it out.

yashugupta786 · December 2, 2020, 6:53am

hi @Karthik12 i am able to use the gpu ,but inference is very slow and time consuming .Is there a way to make the inference fast .I have the done the same changes as you suggested still the inference is very slow and it takes time to process one request

fciannel · December 30, 2020, 9:42pm

Hi guys, I am having the same issue, did you figure out what the issue is? The execution is very slow, althugh the model seems to be on a GPU. It used to be different before on GPU, although I don’t recall exactly the transformers version I was using.

R00 · June 27, 2021, 4:53pm

Hello, have you managed to solve the issue of slow translation ?

yashugupta786 · November 23, 2021, 11:48am

Still getting the same speed. Are you able to manage the faster inference ?

yashugupta786 · November 23, 2021, 11:49am

are you able to manage the inference speed on gpu for marian ?

raphaelmerx · March 11, 2022, 6:30am

Inference is markedly faster for me on GPU, using the device option to pipeline:

model_checkpoint = "Helsinki-NLP/opus-mt-ROMANCE-en"
translator = pipeline("translation", model=model_checkpoint, device=0)

Topic		Replies	Views
Speeding up the inference for marian MT 🤗Transformers	4	2764	April 8, 2024
Use custom Marian-NMT model in transformers 🤗Transformers	0	248	January 9, 2023
Pytorch NLP model doesn’t use GPU when making inference 🤗Transformers	5	14215	January 5, 2024
Slow inference while performing translation Intermediate	0	604	June 10, 2022
How to make single-input inference faster? Create my own pipeline? 🤗Transformers	9	3948	August 26, 2021

NLP Pretrained model model doesn’t use GPU when making inference

Related topics