I use TrOCR for text recognition for administrative document.
Each line of the text is segmented and converted to an image for inference.
I implement a process to run inference on this batch of images in parallel for speed execution :
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
device = torch.device('cuda:0' if torch.cuda.is_available else 'cpu')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed').to(device)
images = []
for img_file in image_files:
images.append(img_file)
futures = [unitary_ocr_trocr.remote(img_path, processor, model, device) for img_path in images]
ray.get(futures)
The code is run over a GPU and freezes without any particular message.
My questions:
Is it possible to implement such a solution
Is the model thread safe
Is it possible to submit a batch instead of unitary images in the processor as I have seen there is an argument images
Thank you in advance for your help and support
Batched generation could be done by creating pixel values of shape (batch_size, num_channels, height, width) which are passed to the model. In that case, you would need to pass a list of images to the processor:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed').to(device)
# pass a list of images to be prepared for the model
pixel_values = processor(images=[image1, image2], return_tensors="pt").pixel_values.to(device)
# next, do batched generation
generated_ids = model.generate(pixel_values)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
Hi,
I have implemented the solution, which works ok.
But the overall duration is more or less similar to the sum of inference over each image. I expected to have a parallelized inference.
On top of this if I integrate the code on a loop, after running the code on a first bunch of images the second run fails as the CUDA memory is not freed. Is their a specific command for this ?
Thank so much for your reply.