ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got

Pengyu965 · January 5, 2024, 9:42pm

The problem is caused when is_vqa is set to true, they use the pix2struct image processor automatically which requires an image input.

Normally, when the processor input doesn’t include an image, it automatically switch back to a simple text tokenizer from image processor, however when is_vqa is set to true in pix2struct auto processor, seems this automatically handling doesn’t work, the processor stick to image processor even your input is text only. Therefore, a simple solution is to use the tokenizer from processor by processor.tokenizer when you would like to tokenize the text inputs, which actually is the ground truth text for model’s output: text_inputs = processor.tokenizer(text=texts, padding="max_length", return_tensors="pt", add_special_tokens=True, max_length=20)

I made some change to the dataset and collator as following, which works correctly:

class ChartParametersDataset(Dataset):
    def __init__(self, data_root) -> None:
        ...
    
    def __len__(self):
        return len(...)
    
    def __getitem__(self, idx):
        your_code_here
        img = Image.open('yourimagehere.jpg').convert("RGB")
        prompt = "Generate underlying data table of the figure below:"
        text = "your ground truth output text"
        
        inputs = {
            "text":txt,
            "prompt":prompt,
            "image":img}
        return inputs

processor = AutoProcessor.from_pretrained("google/deplot")
model = Pix2StructForConditionalGeneration.from_pretrained("google/deplot")

def collator(batch):
    new_batch = {"flattened_patches":[], "attention_mask":[]}
    texts = [item["text"] for item in batch]
    images = [item["image"] for item in batch]
    prompts = [item["prompt"] for item in batch]
  
    text_inputs = processor.tokenizer(text=texts, padding="max_length", return_tensors="pt", add_special_tokens=True, max_length=20)
  
    new_batch["labels"] = text_inputs.input_ids
  
    encoding = processor(images=images, text=prompts,
                         return_tensors="pt", add_special_tokens=True, 
                         max_patches=1024)

    print(encoding)

    new_batch["flattened_patches"] = encoding["flattened_patches"]
    new_batch["attention_mask"] = encoding["attention_mask"]

    return new_batch

Topic		Replies	Views
Pytorch tokenizer unable to create tensor error Models	0	580	July 24, 2023
Invalid image format Intermediate	2	424	October 29, 2024
TypeError: Object of type ndarray is not JSON serializable Models	0	1550	August 19, 2022
Segmant Anything Model (SAM) ValueError: Invalid image type Models	2	11	June 10, 2025
Wrong tensor shape when using a model: TypeError: Cannot handle this data type: (1, 1, 1280, 3), \|u1 Beginners	3	1474	January 9, 2024

ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got

Related topics