Why TrOCR processor has a feature extractor?

Kforcode · November 17, 2021, 6:18am

When we are using an image transformer, why do we need a feature extractor (TrOCR processor is Feature Extractor + Roberta Tokenizer)?
And I saw the output image given by the processor, it’s the same as the original image, just the shape is changed, it resized smaller.
@nielsr is the processor doing any type of image preprocessing ?.
I tried a few image preprocessing techniques like binarising the image, adding white space to borders, a bit of denoising and it turns out to be of little to no help.
Can you please comment on that too

nielsr · November 17, 2021, 8:31am

Hi,

Yes models that take pixel values as an input have a feature extractor defined, that will apply some basic image preprocessing (typically resize the image to a particular size + normalize the color channels).

TrOCR for instance expects every image to be of size 224x224.

Note that many models show better performance by introducing image augmentations (such as random flipping, cropping, etc.) during training. This is not included in the feature extractors, for that you can use packages like torchvision or albumentations.

Kforcode · November 17, 2021, 8:53am

thanks a lot for the explanation @nielsr
can you please also comment on why FeatureExtractor has from_pretrained class method
‘’‘def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):’’’
is it a model, I don’t see
class AutoFeatureExtractor: subclassing nn.Module
and if it has to “apply some basic image preprocessing (typically resize the image to a particular size + normalize the color channels).”
it can be done as a vision.transforms script, so what is AutoFeatureExtractor
is it a model, which learns to do preprocessing, where can I read about its architecture

nielsr · November 17, 2021, 9:26am

Yes feature extractors also have a from_pretrained method, to just load the same configuration as the one of a particular checkpoint on the hub.

e.g. if you do ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224"), it will make sure the size attribute of the feature extractor is set to 224. You could of course also just initialize it as feature_extractor = ViTFeatureExtractor(), as in this case, the feature extractor’s size attribute will be 224 by default as seen in the docs.

AutoFeatureExtractor is a class that aims to make it easier for people not having to specify a model-specific feature extractor. The Auto API will load the appropriate feature extractor by just specifying a model name from the hub. It’s a feature extractor, not a model. It will take care of the preprocessing.

MACong · November 24, 2021, 3:06pm

Hi @nielsr ,
I followed the step-by-step of TrOCR TrOCR-Doc. However, I faced a problem when running this line of code:

pixel_values = processor(images=image, return_tensors="pt").pixel_values

The error information is like:

Traceback (most recent call last):
  File "./trocr_test_base_printed.py", line 14, in <module>
    pixel_values = processor(images=image, return_tensors="pt").pixel_values
  File "/data/***/anaconda3/envs/hug_face/lib/python3.6/site-packages/transformers/models/trocr/processing_trocr.py", line 117, in __call__
    return self.current_processor(*args, **kwargs)
  File "/data/***/anaconda3/envs/hug_face/lib/python3.6/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in __call__
    images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
  File "/data/***/anaconda3/envs/hug_face/lib/python3.6/site-packages/transformers/models/vit/feature_extraction_vit.py", line 141, in <listcomp>
    images = [self.normalize(image=image, mean=self.image_mean, std=self.image_std) for image in images]
  File "/data/***/anaconda3/envs/hug_face/lib/python3.6/site-packages/transformers/image_utils.py", line 149, in normalize
    return (image - mean) / std
ValueError: operands could not be broadcast together with shapes (384,384) (3,)

I guess the problem is the version of transformers and the feature extractor, but I didn’t find the detailed version information. I’m now using the transformers 4.12.3

Could you help me about that?
Many thanks

Kforcode · November 24, 2021, 3:18pm

can you check your image shape and report it should be 3 dimensional. check if it doesn’t

MACong · November 25, 2021, 4:30am

Thanks for your reply.

I tried a local colorful image with 3 dimensional, it work!! THANKS!!

However, when I tried the IAM image, it has the above-mentioned error. Even I tried the exact step-by-step guideline, it also has the above-mentioned error. Have you tried the step-by-step codes? Or do you have any idea how to handle the binary image input? I considered to repeat the 1 channel to 3 channel, but i’m not sure whether this is okay or not.

The step-by-step code is:

>>> from transformers import TrOCRProcessor, VisionEncoderDecoderModel
>>> import requests
>>> from PIL import Image

>>> processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
>>> model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")

>>> # load image from the IAM dataset
>>> url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

>>> pixel_values = processor(image, return_tensors="pt").pixel_values
>>> generated_ids = model.generate(pixel_values)

>>> generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

Kforcode · November 25, 2021, 4:59am

yes you got it right, np.repeat along the last dimension, should do the job

MACong · November 25, 2021, 6:49am

Yeah! ~ It works! Thanks a lot !

Topic		Replies	Views
Error finding processor's image class. Loading based on pattern matching with feature extractor 🤗Transformers	11	12479	October 27, 2023
Using trasnsformer to get image features 🤗Transformers	3	3336	March 20, 2024
Finetuning TrOCR on the IAM dataset 🤗Transformers	1	1104	August 11, 2022
Image classification: Why use both a transform and a processor to preprocess images? Beginners	4	140	September 12, 2024
Processor while fine-tuning TrOCR on IAM 🤗Transformers	0	208	November 28, 2023

Why TrOCR processor has a feature extractor?

Related topics