I am using Marian MT Pretrained model for Inference for machine Translation task integrated with a flask Service . I am running the Model on Cuda enabled device .While inferencing the model not using the GPU ,it is using the CPU only .I don’t want to use the cpu for inference as it is taking very long time for processing the request. Even if i am passing 1 sentence it is taking very long . Please help on this . Below is the code snippet and model i am using
model_name = ‘Helsinki-NLP/opus-mt-ROMANCE-en’
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
I have downloaded the pytorch model.bin and other tokenizer files from s3 environment and saved on my local …Please help on this how i can put the things on GPU for faster inference
Sure kartik thanks
i will check also should i also change this below line
tokenizer = MarianTokenizer.from_pretrained(model_name).to(‘cuda’)
or translated = model.generate(**tokenizer.prepare_translation_batch(src_text).to(‘cuda’))
only to the above one which you told
tokenizer = MarianTokenizer.from_pretrained(model_name).to(‘cuda’) - Do you get an error here, like, "‘MarianTokenizer’ object has no attribute ‘to’? If so, you can give as I have mentioned.
Try it out.
hi @Karthik12 i am able to use the gpu ,but inference is very slow and time consuming .Is there a way to make the inference fast .I have the done the same changes as you suggested still the inference is very slow and it takes time to process one request
Hi guys, I am having the same issue, did you figure out what the issue is? The execution is very slow, althugh the model seems to be on a GPU. It used to be different before on GPU, although I don’t recall exactly the transformers version I was using.