Can I use Dataloader for image and text processing with ViltProcessor?

I wanted to fine-tune ViLT(Vision Language Model) for my task. In my dataset, I have 10 images with 1 text. For ViltForImagesAndTextClassification, I can increase the number of images using ViltConfig. But I am not able to preprocess the dataset using ViltProcessor through a Dataloader.

Is it possible to pass images and text in a Batch to ViLTProcessor? If possible, Can anyone help me how to do that?

Thanks in advance.