I would like to finetune the blip model on ROCO data set for image captioning of chest x-rays

This is the code:

trainimgs, traincapts, testimgs, testcapts = get_data() 
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-large")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-large")

metric = evaluate.load("accuracy")
traindata = processor(text=traincapts, images=trainimgs, return_tensors="pt", padding=True, truncation=True)
evaldata =  processor(text=testcapts, images=testimgs, return_tensors="pt", padding=True, truncation=True)
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=traindata,
    eval_dataset=evaldata,
    compute_metrics=compute_metrics
)
trainer.train()`Preformatted text`

Error:


***** Running training *****
  Num examples = 3
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 3
  Number of trainable parameters = 469732924
Traceback (most recent call last):

  File "D:\NioyaTech\image_capt.py", line 82, in <module>
    trainer.train()

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\transformers\trainer.py", line 1547, in train
    ignore_keys_for_eval=ignore_keys_for_eval,

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\transformers\trainer.py", line 1765, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
    data = self._next_data()

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]

  File "C:\Users\omair\anaconda3\envs\torch\lib\site-packages\transformers\feature_extraction_utils.py", line 86, in __getitem__
    raise KeyError("Indexing with integers is not available when using Python based feature extractors")

KeyError: 'Indexing with integers is not available when using Python based feature extractors'

Can you please explain what’s causing the error and how to rectify it?