Get original image from trocr processor

phade160 · October 10, 2022, 12:41pm

Hi @nielsr, I am following your TrOCR finetuning with PyTorch but I have two questions: The processor resizes the image from 1700 x 134 to 384 x384, 1) is there a way to maintain the height of the original image or even use a custom dimension for training, eg. 512 x 134. and 2) is there a way to get the original image back for logging purposes as the processed image is unrecognizable after those basic augmentations. Thanks

nielsr · October 10, 2022, 1:17pm

Hi,

thanks for your interest in TrOCR.

what you can do is use torchvision.transforms.resize, which allows to resize an image but keeping the aspect ratio.

Technically you could use a different resolution during training (I guess you mean fine-tuning), by interpolating the position embeddings of the vision encoder to the custom resolution. This can be achieved by passing interpolate_pos_encoding=True to the forward of VisionEncoderDecoderModel (which makes sure the vision encoder, ViT, will interpolate the position embeddings to the custom resolution).

sure that’s possible, you just need to unnormalize the image in order to visualize it. See here for a function that you can use to visualize transformations.

Topic		Replies	Views
Why TrOCR processor has a feature extractor? Beginners	8	1420	November 25, 2021
Fine tuning image transformer on higher resolution Beginners	11	7902	May 1, 2024
Pyramid Vision Transformer: Issue with input image size larger than 224 px 🤗Transformers	0	1552	September 15, 2023
How to fine tune TrOCR model properly? Beginners	2	8488	November 15, 2021
TrOCR issues Stop Iteration training Models	0	391	March 24, 2023

Get original image from trocr processor

Related topics