Why TrOCR processor has a feature extractor?

Thanks for your reply.

I tried a local colorful image with 3 dimensional, it work!! THANKS!!

However, when I tried the IAM image, it has the above-mentioned error. Even I tried the exact step-by-step guideline, it also has the above-mentioned error. Have you tried the step-by-step codes? Or do you have any idea how to handle the binary image input? I considered to repeat the 1 channel to 3 channel, but i’m not sure whether this is okay or not.

The step-by-step code is:

>>> from transformers import TrOCRProcessor, VisionEncoderDecoderModel
>>> import requests
>>> from PIL import Image

>>> processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
>>> model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten")

>>> # load image from the IAM dataset
>>> url = "https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

>>> pixel_values = processor(image, return_tensors="pt").pixel_values
>>> generated_ids = model.generate(pixel_values)

>>> generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]