Could not fine-tune deplot model

rlee002 · July 24, 2023, 8:51am

Hi,
I am trying to train the Deplot model using this following Pix2Struct example:

huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU",
    "gpuClass": "standard",
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "1d15a70612fc4dfb96d847e3ad2fdea9": {
          "model_module": "@jupyter-widgets/controls",

This file has been truncated. show original

The bulk of the code is almost the same but with some minor adjustments.

My dataset

class DeplotDataset(Dataset):
    def __init__(self, image_folder, text_folder, processor, transform=None):
        self.image_folder = image_folder
        self.text_folder = text_folder
        self.processor = processor
        self.transform = transform

        self.image_filenames = sorted(os.listdir(image_folder))
        self.text_filenames = sorted(os.listdir(text_folder))

    def __len__(self):
        return len(self.image_filenames)    

    def __getitem__(self, index):
        image_filename = self.image_filenames[index]
        text_filename = self.text_filenames[index]

        image_path = os.path.join(self.image_folder, image_filename)
        text_path = os.path.join(self.text_folder, text_filename)

        image = Image.open(image_path)
        with open(text_path, 'r') as f:
            text = f.read()

        if self.transform:
            image = self.transform(image)

        encoding = self.processor(images=image, text="Generate underlying data table of the figure below:", return_tensors="pt", add_special_tokens=True, max_patches=MAX_PATCHES)
        
        encoding = {k:v.squeeze() for k,v in encoding.items()}
        encoding["text"] = text
        return encoding

def collator(batch):
    new_batch = {"flattened_patches":[], "attention_mask":[]}
    texts = [item["text"] for item in batch]

    text_inputs = processor(text=texts, padding="max_length", truncation=True, return_tensors="pt", add_special_tokens=True, max_length=20)

    new_batch["labels"] = text_inputs.input_ids

    for item in batch:
        new_batch["flattened_patches"].append(item["flattened_patches"])
        new_batch["attention_mask"].append(item["attention_mask"])

    new_batch["flattened_patches"] = torch.stack(new_batch["flattened_patches"])
    new_batch["attention_mask"] = torch.stack(new_batch["attention_mask"])

    return new_batch

But I get this following error:

ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got <class 'NoneType'>.

jpjp9292 · January 9, 2024, 1:40am

actually i am having the same issue. anyone could help this problem??

rlee002 · January 10, 2024, 2:06am

This solution helped in my case

Pengyu965 · January 10, 2024, 4:12am

Check out here: ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got - #5 by Pengyu965

Topic		Replies	Views
ValueError: Invalid image type. Expected either PIL.Image.Image, numpy.ndarray, torch.Tensor, tf.Tensor or jax.ndarray, but got 🤗Transformers	6	4244	January 5, 2024
Pytorch tokenizer unable to create tensor error Models	0	580	July 24, 2023
For google/deplot, what should I input as header text for fine-tuning? Models	7	1420	May 11, 2023
ValueError: too many values to unpack (expected 3) using the DETR model 🤗Transformers	1	648	July 10, 2024
Model inference using batch (Encoder-decoder) Models	0	641	September 13, 2023

Could not fine-tune deplot model

Related topics